Abstract
Barrett's esophagus is a precancerous condition that can progress in a stepwise manner to dysplasia and eventually esophageal adenocarcinoma (EAC). Once diagnosed, patients with Barrett's esophagus are kept on surveillance to detect progression so that timely intervention can occur with endoscopic therapy. Several demographic and clinical risk factors are known to increase progression toward EAC, such as longer Barrett's segments, and these patients are kept on tighter surveillance. While p53 IHC has been advocated as an adjunct to histopathologic diagnosis, use of this biomarker is variable, and no other molecular factors are currently applied. Given the new evidence available, it is time to consider whether other risk factors or tools could be applied in clinical practice to decide on closer or attenuated surveillance. In this commentary, we summarize the most relevant risk factors for Barrett's esophagus progression, highlight the most promising novel risk stratification tools—including nonendoscopic triage and commercial biomarker panels, and propose a new framework suggesting how to incorporate risk stratification into clinical practice.
The Clinical Challenge of Surveillance
The majority of our patients with Barrett's esophagus under surveillance never progress toward cancer. Therefore, risk stratification is essential to understand which patients could benefit from intensified surveillance and treatment, and on the other hand, identify those patients who could be spared such frequent endoscopy. When considering which risk factors are relevant, it is important to consider that not all risk factors for the development of Barrett's esophagus are relevant to the risk for progression and so we have clarified the distinction (Fig. 1). In the past few years, risk stratification models using a wider range of risk factors have been proposed to refine the decision making (1, 2), and there is emerging evidence that addition of molecular markers could create higher diagnostic accuracy compared with our current surveillance methods. For example, although not currently used in clinical practice, incorporating negative factors for progression, such as persistent non-dysplastic Barrett's esophagus (NDBE; ref. 3), might be of particular relevance for young patients with a long period of surveillance ahead. In this commentary, we will discuss factors which should play a key role in defining the new risk stratification models. There is a wide range of molecular factors available, but we focus here on assays which are either accessible in most standard clinical laboratories or commercially available.
In addition, our current surveillance consists of endoscopy with Seattle protocol biopsies which has a relatively low sensitivity to detect dysplasia due to sampling bias and the subjective nature of dysplasia assessment (4). However, novel tools might be able to improve this by more operator independent and comprehensive sampling of the Barrett's tissue and by incorporating molecular analysis. An assessment of the current evidence suggests that the time might be ripe for known molecular risk factors, such as p53, to become more clinically applicable in risk modeling of our patients with Barrett's esophagus.
In the final part of this commentary, we suggest a new path forward in which we start to implement a more risk tailored approach to surveillance into clinical practice. With careful implementation and audit, we have an opportunity to pave progress in a field which is currently doing unnecessary procedures for those at very low risk while still missing the opportunity to improve outcomes for those at high risk.
Risk Categories for Barrett's Esophagus Neoplastic Progression
Given the highly variable progression rates in individuals diagnosed with Barrett's esophagus, risk factors for progression are an essential part of Barrett's esophagus surveillance decisions. Current Barrett's esophagus surveillance intervals are based on only two risk factors: Barrett's esophagus segment length and on the histopathologic grading of dysplasia. The guidelines generally recommend close surveillance for indefinite for dysplasia (IND) and low-grade dysplasia (LGD), with treatment when LGD is confirmed on two occasions (5, 6).
Demographic factors
Barrett's esophagus and EAC are both more prevalent in White Caucasians; however, there is limited data on whether ethnicity plays a role in progression (7). Male gender [OR, 2.12; confidence interval (CI), 1.78–2.52] and age (OR, 1.03 per unit increase; CI, 1.01–1.05) are associated with an increased risk for progression, at a level similar to that reported for Barrett's esophagus development (8). However, we should recognize that there is a low prevalence of women in Barrett's esophagus cohorts and therefore, these risk estimates may be inaccurate. Among a pooled study from six centers, the annual progression rate to HGD/EAC was indeed significantly lower for women versus men (0.05% vs. 0.3%, N = 324 vs. 1,821), nevertheless the percentages of patients progression toward LGD were interestingly comparable (12.0% vs. 15.2%; ref. 9).
A positive family history is a strong risk factor for a diagnosis of BE, but only one study looked at this in relation to neoplastic progression. In a large cohort of patients known with Barrett's esophagus or EAC, a possible association was suggested between early onset of disease and the number of affected family members. Patients with ≥3 affected individuals were diagnosed with HGD or EAC at a younger age of 56 years compared with diagnosis at a median age of 64 years in patients with fewer than two affected members; however, in a multivariate logistic regression, this relationship lost significance (10). The frequency of a positive family history is low and often inaccurately reported, limiting the value of this risk factor in risk stratification scores.
In terms of lifestyle factors, smoking is a weak-moderate risk factor (OR, 1.53; CI, 0.94–2.48) for progression to HGD/EAC even when corrected for other confounders as sex, age, and Barrett's esophagus length (8). Alcohol does not seem to have a role in progression, and although obesity has been associated with Barrett's esophagus development, current evidence shows no relation to neoplastic progression (8).
Endoscopic factors
Longer segment length is a well-known risk factor for neoplastic progression, but the cutoffs used in the guidelines seem somewhat arbitrary. The European Society of Gastrointestinal Endoscopy suggest 3 yearly surveillance for >3 cm segments and annual endoscopy for segments >10 cm with the added suggestion that this should be performed in Barrett's esophagus expert centers. The British Society of Gastroenterology recommend 2–3 yearly surveillance for >3 cm segments, and the American College of Gastroenterology suggests 3–5 yearly monitoring and no length is given (5, 11, 12). One systematic review showed an OR 1.25 (CI, 1.16–1.36) per centimeter for progression to HGD/EAC in NDBE/LGD patients (8), while another systematic review, focusing only on patients with NDBE showed annual progression rates of 0.24% for short segment Barrett's esophagus (<3 cm) compared with 0.53% for long segments (≥3 cm; ref. 11). Although a longer BE segment is a risk factor, there is little direct evidence on how to define long versus short and larger cohort studies defining optimal cutoffs and excluding questionable diagnoses of Barrett's for <1 cm segments are required.
Histopathologic factors
The diagnosis of IND is made when the pathologist encounters atypical cells and abnormal architecture but insufficient for the diagnosis of dysplasia, based on the Vienna classification. Although progression to HGD/EAC is low in patients with IND, the risks are still about 3-fold increased compared with NDBE, illustrating that this is a relevant risk factor. The risk fluctuates due to uncertainty about the diagnosis—with a substantial proportion having only active inflammation, whereas in some patients this can represent genuine dysplasia (13, 14). Hence, persistent IND is a significant risk factor (OR, 3.2; CI, 1.04–9.98), and adequate acid suppression is the first step to help clarify the diagnostic uncertainty caused by inflammation after an IND diagnosis (15). An additional biomarker to help distinguish between true dysplasia from inflammatory atypia is aberrant expression of the p53 protein. p53 IHC has the potential to increase the degree of intraobserver agreement between pathologists and lead to a more definitive diagnosis (16).
LGD is a strong risk factor for Barrett's esophagus progression, with annual rates in the literature in the region of between 4.2 and 6.7 per 100 person-years progressing to HGD/EAC (8, 17, 18), but as mentioned already, pathology assessment is notoriously difficult with a high rate of intraobserver disagreement (19, 20). Confirmation by a second pathologist is key, as with each extra pathologist confirming the degree of dysplasia the annual neoplasia progression rate increases, and can reach as high as 20% for HGD/EAC when confirmed by three expert pathologists (19). In addition, p53 IHC is useful to increase the intraobserver agreement and as routine clinical laboratories gain more experience with this very specific biomarker, it is possible that this could supersede the requirement for a second confirmatory procedure (21). Artificial intelligence aided analyses of digital pathology slides might provide a more robust option to improve this intraobserver variety, but more evidence with validation in independent datasets is required (22). Hence, LGD confirmed on two consecutive endoscopies is currently the only risk factor, together with HGD/EAC, used to consider endoscopic treatment. There are randomized trial data to show that treatment of dysplastic Barrett's esophagus leads to reduced neoplastic progression rates and reduces the rate of neoplastic progression (23).
Risk Stratification in Clinical Practice; Triaging Upfront
On the basis of these known risk factors, several clinical risk stratification tools have been developed to aid in the decision making for an individual patient. The clinical risk stratification model tested in the largest dataset is described in the study of Parasa and colleagues, including 2,697 patients and analyzing 11 factors. Four risk factors were included in the final model: gender, smoking, Barrett's esophagus length, and LGD at baseline (1). Age, race, hiatus hernia, proton-pump inhibitor (PPI), NSAID's, or aspirin use were considered but not found to be associated. Three categories were created to predict the annual progression risk: low (annual risk 0.13%), intermediate (0.73%), and high (2.10%), and discontinuation of surveillance in the lower risk group was suggested. This model was validated in two other studies (2, 24), both of which found a comparable AUC of 0.70, although in the study of Nguyen and colleagues this was achieved by reclassifying the model into high- and low-risk groups based on a different cutoff. Overall, compared with the already clinically used risk factors of Barrett's esophagus length and dysplasia, addition of male sex and smoking into the model has an additional 21% stratification benefit (2). The advantage of a clinical risk stratification tool is that nowadays calculators can be easily designed either as online tool or incorporated within the electronic patient system, making it an attractive quick and easy option to add to our decision making.
An adjunct to clinical risk factors could be to introduce a simple test into the triage ahead of endoscopy. The Cytosponge-TFF3 test (Medtronic UK) is a nonendoscopic capsule on a string device which retrieves cells from along the length of the esophagus for laboratory assessment of a panel of biomarkers. The device is well tolerated, less invasive, and costly than a regular esophagoduodenoscopy (EGD) (25). While originally trialed as a screening tool, the device characteristics and biomarker flexibility make it an attractive tool to help triage patients undergoing Barrett's esophagus surveillance to identify those patients likely to benefit from endoscopy. In the initial surveillance panel, the Cytosponge was combined with a multidimensional biomarker panel consisting of clinical (age and length of Barrett's esophagus), molecular (aurora kinase A and p53 IHC and TP53 sequencing) and histologic (cytological atypia) biomarkers, and when fitted into a regression model it was shown to be able to predict those with dysplasia with good accuracy (26). To facilitate clinical implementation, the panel was simplified, and it was demonstrated in a retrospective cohort of 891 patients divided into a training and validation set that p53 IHC together with any cytologic atypia is highly effective for prioritizing patients for endoscopy with an AUC for HGD/EAC of 0.86 in the validation cohort. When adding clinical risk factors (age, Barrett's esophagus segment length, and sex) to identify a moderate-risk group who needs more frequent surveillance, the AUC for HGD/EAC increased to 0.91 (25). These analyses were all performed with a direct comparison with endoscopic biopsies taken at the same endoscopy as a gold standard. This broadens the potential of the Cytosponge test to be used in risk stratification for patients with Barrett's esophagus and further prospective evaluation of this algorithm is ongoing (ISRCTN91655550).
Risk Stratification in Clinical Practice; Novel Biomarker Tools Applied to Endoscopy Samples
At the time of endoscopy, new wide-field sampling devices have emerged to supplement traditional biopsy, as well as molecular biomarkers which can improve risk stratification. The biomarker field is vast with new evidence emerging continually, here we will focus on those that are available for clinical implementation (Table 1).
Tool . | Model . | Biomarkers . | Sensitivity/specificity for predicting neoplastic progression to HGD/EAC . | Costs . |
---|---|---|---|---|
p53 | IHC staining | NDBE/LGD cohort: Sensitivity 49%, Specificity 86% (18) | $ | |
TissueCypher | IF of molecular biomarkers + morphologic factors (nuclear size, shape, DNA amount) | p53, p16, AMACR, HER-2) CD68, COX2, HIF1 alpha, CD45RO | Mixed NDBE/IND/LGD cohort: high-intermediate risk class 55%/82% (29) | $$$ |
Wats3D | Extensive Barrett's segment sampling + computer-aided analysis | 2.1 % incremental yield in a mixed cohort with NDBE and all variants of dysplasia (34) | $$ |
Tool . | Model . | Biomarkers . | Sensitivity/specificity for predicting neoplastic progression to HGD/EAC . | Costs . |
---|---|---|---|---|
p53 | IHC staining | NDBE/LGD cohort: Sensitivity 49%, Specificity 86% (18) | $ | |
TissueCypher | IF of molecular biomarkers + morphologic factors (nuclear size, shape, DNA amount) | p53, p16, AMACR, HER-2) CD68, COX2, HIF1 alpha, CD45RO | Mixed NDBE/IND/LGD cohort: high-intermediate risk class 55%/82% (29) | $$$ |
Wats3D | Extensive Barrett's segment sampling + computer-aided analysis | 2.1 % incremental yield in a mixed cohort with NDBE and all variants of dysplasia (34) | $$ |
Although aberrant protein expression of the tumor suppressor gene TP53 has been recognized for a long time as a risk factor for malignant progression (18, 27), this biomarker has not yet been incorporated into routine clinical assessment of endoscopic biopsies taken during Barrett's esophagus surveillance. In a systematic review analyzing 12 studies, aberrant p53 expression (overexpression or loss) was a significant risk factor for neoplastic progression with comparable risk to LGD (OR, 7.04; 95% CI, 3.68–13.46). This risk is regardless of dysplasia status with ORs for LGD of 8.64 (95% CI, 3.62–20.62) and NDBE of 6.12 (95% CI, 2.99–12.52; ref. 27). Recently, the utility of a p53 IHC was validated in a large prospective study including 1,438 patients with NDBE and LGD, and using a more complex score for the IHC staining, progression could also be predicted ahead of phenotypic evidence of dysplasia in some patients (28). Overall, combination of standard p53 IHC scoring applied to endoscopic biopsy samples along with clinical risk factors might lead to refined predictions in a way that is highly affordable and achievable in mainstream practice. Moreover, the use of wide-field sampling devices such as the Cytosponge or Wats3D could make the p53 analysis more robust with sampling the whole Barrett's esophagus segment instead of just a single biopsy.
TissueCypher (Castle Biosciences, Inc.) is a pathologic assessment tool which combines fluorescent imaging of biomarkers and morphologic factors, comprising 15 features, using standard paraffin-embedded biopsies. The analysis is performed in a centralized, commercial laboratory and involves a machine learning algorithm which risk classifies patient into three groups (high, intermediate, and low). A recent pooled analysis of four case–control studies including NDBE, IND, and LGD showed a 98% specificity for predicting neoplastic for the high-risk category, but sensitivity is low at around 38%. The sensitivity could be increased to 55% when combining the high- and intermediate-risk group at a loss of specificity (29). It should be noted that the same research group showed cost-effectiveness is only reached at a sensitivity of at least 50%, due to the high cost and complexity of the assay (30). For LGD, the advantages of the TissueCypher assay test seem clearer; it could reclassify 50%–56% patients as progressors who were initially downstaged to NDBE by the pathologist (31). The question remains whether the costs of the panel justify its value above p53 testing, which is part of the panel. In a subgroup analysis from one of the case–control studies, p53 was suggested to have lower HR (2.18; CI, 0.99–4.76) than the TissueCypher test (4.43; CI, 2.32–8.46). However, this risk ratio for p53 is much lower than the OR of 4.7–7 suggested in other studies with conventional p53 IHC which might be explained because of the different immunofluorescence staining and scoring methodology used when performed as part of the TissueCypher panel (32). Overall, the test could potentially have a clinical role in further risk stratifying patients with LGD, but the role for patients with NDBE requires further validation.
The Wats3D, the Wide Area Transepithelial Sampling with Three-Dimensional Computer-Assisted analysis (CDx Diagnostics), is an endoscopic brush which provides deeper sampling of the Barrett's mucosa over a wider surface area. After submitting the sample to a centralized laboratory, a subsequent computer-aided analysis of the cytologic specimen is performed with confirmation of dysplastic cells by a pathologist. In a large multicenter study including both screening and surveillance cases, Wats3D led to an additional 213 dysplasia diagnoses, of which 10 HGD/EAC cases were “missed” with biopsies. However, adherence to the Seattle protocol was not monitored and therefore, the authors acknowledge that the benefit compared with expert endoscopies cannot be determined (33). A meta-analysis did find an incremental yield for any type of dysplasia of 7% and 2% for HGD/EAC, but most of these studies were without endoscopic biopsy confirmation to compare with the gold standard, and the Wats3D assay was negative in 62.5% cases where biopsies revealed dysplasia (34). Confirmatory review of the samples is also hampered by the fact that images are created by three-dimensional reconstruction of slides and cannot be distributed for confirmation. For further clinical implementation, long-term results and head-to-head comparative studies are required to determine the additional value of Wats3D compared with conventional Seattle protocol.
Toward a New Framework for Surveillance Using Risk Stratification
The integration of these new devices into clinical practice raises several questions such as delineating the consequences of a positive result. In one of the TissueCypher studies, the authors suggested immediate treatment after reclassifying patients as high-risk patients based on the test result regardless of the results of the standard biopsy dysplasia assessment (35). Current guidelines recommend the confirmation of LGD at two consecutive endoscopies and second review by an expert pathologist (5, 6). The question therefore arises how to value these new devices biomarkers, should they be considered as complementary or equivalent? At some stage to move the field, forward one has to take a view about whether the evidence is sufficient to warrant a new algorithm—while still auditing the performance. We propose that given the high specificity of the Cytosponge and TissueCypher tests to regard them as an equivalent diagnosis strategy, in other words if patients are categorized as high-risk by either of these tests than a single confirmatory endoscopic biopsy confirming dysplasia (ideally with p53 as this is included in Cytosponge, Wats3D, and TissueCypher tests) is required to initiate treatment. This also highlights another clinical dilemma, how to proceed if the assay classifies the patient as high-risk but endoscopy is unremarkable. Disregarding the results of these highly specific devices would not be sensible, and we suggest classifying all these patients as clinically high-risk patients and to consider repeat endoscopy within 6–12 months.
The next consideration is whether, if we intensify surveillance and treatment in one group, we can do the opposite in low-risk categories, easing the burden of surveillance regimes for these patients? For the TissueCypher, currently the low sensitivity does not justify this, and for the Cytosponge test more evidence is needed to characterize this low-risk category to be sure that Cytosponge follow-up with longer time intervals is safe.
One might also consider whether other clinical features could be integrated to justify an extended surveillance interval. For example, one proposed risk-reducing factor is persistent NDBE, but evidence has been conflicting. Two large studies with patients with NDBE with at least 1 year of follow-up, divided patients into five groups depending on the number of consecutive endoscopies (3, 36). In one study, incidence ratios (IR) were calculated as overall incidence from the confirmative second endoscopy, which decreased with each confirmative endoscopy (IR for progression to HGD/EAC: 0.69, 0.50, 0.45, 0.45, and 0.22 per 100 person-years for 1 till 5 endoscopies; ref. 3). While in another study, annual risk for progression to HGD/EAC was calculated between endoscopies, which also decreased (0.75%, 0.57%, 0.41%, 0.44%, and 0% per year; ref. 36). Hence progression-free time seemed to be a good indicator for a less aggressive disease course and with a cutoff of 4 years, sensitivity and specificity are 90.4% and 72.4%, respectively (3). However, the group of Kunzmann showed more variation in progression risks between the number of consecutive endoscopies and could not confirm a decreasing trend, perhaps because of the smaller numbers (37). Prospective data in a well-defined cohort might give definitive answers, which could lead to a substantial reduction in our surveillance endoscopies.
On the basis of the data available so far in Fig. 2, we propose a framework to guide further discussion for patients diagnosed with Barrett's esophagus after initial EGD. Scenario A is the current setting where the surveillance interval is determined only by the level of dysplasia and Barrett's segment length, and the exact time intervals vary according to the local guidance. Adding p53 IHC to endoscopy biopsies routinely for IND and LGD cases in scenario B, would be a relatively inexpensive and straightforward intervention to increase the diagnostic certainty for true dysplasia. Moreover, it could be considered to add p53 IHC to all patients including NDBE cases to also identify a confident low-risk category who with repeated dysplasia/p53-negative endoscopies might qualify for a lenient surveillance schedule—though this would add more costs. Scenario C adds novel tools as TissueCypher or Wats3D to the standard EGD, and thereby mostly focuses on optimizing the high-risk category. This adds higher costs if applied to all patients, and therefore requires additional data to determine which patients warrant these additional laboratory tests compared with scenario A and B. In scenario D, applying a Cytosponge triage along with clinical risk factors (Barrett's esophagus segment length, age, and male gender), could be an upfront tool to stratify patients with Barrett's esophagus into low- and moderate-risk groups with different follow-up frequencies. This would substantially reduce the population requiring endoscopy, and thus enrich the likelihood of finding dysplasia in those individuals referred on for an endoscopy. Because of the reduced endoscopy workload, one could also consider placing patients with aberrant Cytosponge biomarkers (atypia and p53) on a dedicated list with sufficient time and an experienced endoscopist to use advanced imaging tools to localize the dysplasia, and this might help reduce sampling bias.
In the new scenarios B and C, patients with high-risk biomarkers could have surveillance according to current societal recommendations for patients with high-risk Barrett's esophagus; for example, a repeat endoscopy in 6 months. The moderate-risk category could keep their regular surveillance interval after 3–5 years, and the low-risk category might allow for a lenient schedule doubling current intervals to 6 and 10 years with possible discharge after repeat negative surveillance. More data are required to determine the surveillance intervals for the Cytosponge strategy—one could envisage that those in the low-risk category could have their surveillance performed by Cytosponge with endoscopy only triggered by positive Cytosponge biomarkers, whereas for the moderate group alternate with Cytosponge or Seattle protocol endoscopy.
For all these innovations, we recognize knowledge gaps and acknowledge that data for the new devices and assays need further validation. Health economic studies are also required to determine the optimum trade-offs in terms of health care costs for the health care system and balancing risk and benefits of the monitoring procedures. However, with thoughtful implementation research, we can start gathering these data on the scale required to obtain the answers without delaying progress.
Authors' Disclosures
R.C. Fitzgerald reports a patent for Cytosponge-TFF3 licensed to Medtronic; and co-founder and shraeholder (<3%) of Cyted Ltd. No disclosures were reported by the other author.
Acknowledgments
J. Honing is a Clinical Fellow supported by the Bernard Wolfe Health Neuroscience Fund with infrastructure support for clinical research from the CRUK Cambridge Centre and the NIHR Biomedical Research Centre.
R.C. Fitzgerald is supported by an MRC Programme Grant and infrastructure support from the CRUK Cambridge Centre and the NIHR Biomedical Research Centre.