Abstract
Background: Better methods are needed to predict risk of progression for Barrett's esophagus. We aimed to determine whether a tissue systems pathology approach could predict progression in patients with nondysplastic Barrett's esophagus, indefinite for dysplasia, or low-grade dysplasia.
Methods: We performed a nested case–control study to develop and validate a test that predicts progression of Barrett's esophagus to high-grade dysplasia (HGD) or esophageal adenocarcinoma (EAC), based upon quantification of epithelial and stromal variables in baseline biopsies. Data were collected from Barrett's esophagus patients at four institutions. Patients who progressed to HGD or EAC in ≥1 year (n = 79) were matched with patients who did not progress (n = 287). Biopsies were assigned randomly to training or validation sets. Immunofluorescence analyses were performed for 14 biomarkers and quantitative biomarker and morphometric features were analyzed. Prognostic features were selected in the training set and combined into classifiers. The top-performing classifier was assessed in the validation set.
Results: A 3-tier, 15-feature classifier was selected in the training set and tested in the validation set. The classifier stratified patients into low-, intermediate-, and high-risk classes [HR, 9.42; 95% confidence interval, 4.6–19.24 (high-risk vs. low-risk); P < 0.0001]. It also provided independent prognostic information that outperformed predictions based on pathology analysis, segment length, age, sex, or p53 overexpression.
Conclusion: We developed a tissue systems pathology test that better predicts risk of progression in Barrett's esophagus than clinicopathologic variables.
Impact: The test has the potential to improve upon histologic analysis as an objective method to risk stratify Barrett's esophagus patients. Cancer Epidemiol Biomarkers Prev; 25(6); 958–68. ©2016 AACR.
Introduction
Barrett's esophagus is a precursor to esophageal adenocarcinoma (EAC). Although the risk of progression of Barrett's esophagus to EAC is very low (1–3), treatment options for advanced EAC are limited and early detection is critical for optimal patient management. EAC can be prevented if dysplasia is detected and treated early with endoscopic therapies such as radiofrequency ablation (RFA) and/or endoscopic mucosal resection (EMR; refs. 4–6). Despite endoscopic surveillance programs, the increasing incidence of EAC continues to remain a health concern (7). Accurate tests are needed to identify Barrett's esophagus patients who are at high risk for progression and require therapeutic intervention as well as to recognize low-risk Barrett's esophagus patients who can potentially reduce the frequency of surveillance.
Current practice guidelines recommend endoscopic surveillance with biopsies at frequencies determined by the histologic diagnosis (8). However, the histologic evaluation is limited by interobserver variation and random sampling (9–12). Furthermore, the molecular and cellular changes associated with malignant transformation can precede the morphologic changes that pathologists can evaluate by histology (13). Efforts have long been underway to identify risk prediction biomarkers in Barrett's esophagus. This concept has become more important with the advent of highly effective endoscopic therapies such as RFA and EMR. Many biomarkers have been evaluated in Barrett's esophagus (14–17) but risk prediction biomarkers have been difficult to identify. The British Society of Gastroenterology (BSG) recommends use of p53 IHC to aid diagnosis (18); however, biomarkers for accurate risk prediction have not been validated to date. The complex structure of tissues highlights the need for an alternative systems biology approach (19). Assessment of tissues as a “system” has the potential to improve upon current tools by quantifying genetic, immunologic, vascular, and morphologic features relevant to patient outcomes. This tissue systems pathology approach has been demonstrated to have potential diagnostic applications in Barrett's esophagus (20). This approach may also have prognostic applications by objectively quantifying multiple molecular and cellular features that precede definitive morphologic changes. The aim of this study was to develop and validate a tissue systems pathology test that predicts future risk of progressing to high-grade dysplasia (HGD)/EAC in patients with Barrett's esophagus.
Materials and Methods
Study design and patients
A nested case–control study was constructed that utilized a multicenter cohort of Barrett's esophagus patients in surveillance programs with clinical outcome data from four institutions. Barrett's esophagus cases with a diagnosis of nondysplasia (ND), indefinite for dysplasia (IND), or low-grade dysplasia (LGD) were retrieved from Geisinger Health System, University of Pittsburgh (Pittsburgh, PA), University of Pennsylvania (Philadelphia, PA), and Academic Medical Center (AMC; Amsterdam, The Netherlands). The diagnoses were confirmed by a single gastrointestinal subspecialist pathologist at each U.S. institution (J.M. Davison, N.C. Jhala, J. Li). Inclusion criteria were availability of tissue and clinicopathologic data and confirmation of Barrett's esophagus. Exclusion criteria were history of HGD/EAC, diagnosis of HGD/EAC in less than 1 year, insufficient tissue quality as assessed by a pathologist, and preparation of tissue with Bouin fixative or methylene blue. The earliest surveillance biopsy that satisfied inclusion/exclusion criteria was selected for each patient. Additional information can be found in Supplementary Methods. Patients who progressed to HGD/EAC in ≥1 year (incident progressors/cases, n = 41 training, n = 38 validation) were matched to nonprogressor controls with median HGD/EAC-free surveillance time of 5.6 years (n = 142 training, n = 145 validation) based on gender, segment length, and age where possible, and also by location (i.e., U.S. cases matched to U.S. controls and the Netherlands cases matched to the Netherlands controls).Case–control sets from the United States and the Netherlands institutions were randomly assigned to training or validation sets (Table 1). In the training set, 33 of 41 progressor patients progressed to HGD and 8 of 41 to EAC and in the separate validation set, 29 of 38 progressor patients progressed to HGD, and 9 of 38 to EAC (Supplementary Table S1). Data elements collected were: case collection date, original pathologic diagnosis provided by a generalist pathologist and gastrointestinal subspecialist review diagnosis for the case tested, date and original diagnosis of every surveillance biopsy, progression endpoint (HGD/EAC), HGD/EAC-free surveillance time (time between case tested and HGD/EAC diagnosis or last follow-up), age, sex, race, and segment length (short ≤ 3 cm, long > 3 cm). The study was approved by the institutional review boards at each institution.
Candidate biomarker selection
The following candidate panel of 14 protein biomarkers was selected and examined in the study: K20, Ki-67, β-catenin, p16INK4a, AMACR, p53, HER2/neu, CDX-2, CD68, NF-κB-p65, COX-2, HIF1α, CD45RO, and CD1a. The biomarkers included markers of epithelial cell abnormalities described in the progression of Barrett's esophagus and also stromal biomarkers known to play a role in carcinogenesis (14, 21–33).
Fluorescence immunolabeling
Five-micron sections of formalin-fixed paraffin-embedded (FFPE) Barrett's esophagus biopsies were stained with hematoxylin and eosin (H&E) by standard methods. Additional sections were labeled by multiplexed immunofluorescence for the candidate biomarker panel, plus Hoechst, according to previously described methods (20). The biomarkers were multiplexed in subpanels consisting of Hoechst and 3 biomarkers/slide detected via Alexa Fluor-488, -555, and -647-conjugated secondary antibodies (Life Technologies).
Whole slide imaging
H&E-stained slides were imaged at 20× magnification on a NanoZoomer Digital Pathology scanner (Hamamatsu Photonics, K.K.). Fluorescently immunolabeled slides were imaged at 20× on a ScanScope FL (Aperio Technologies/Leica BioSystems) with a standardized imaging procedure as described previously (20).
Image analysis
Whole slide fluorescence images were analyzed using the TissueCypher Image Analysis Platform (Cernostics, Inc.), which utilizes algorithms for collection of quantitative biomarker and morphology feature data at the cellular and subcellular level, and within tissue compartments such as epithelium, metaplasia, and lamina propria. Features (continuous, quantitative measurements of biomarkers and/or morphology) included biomarker intensities and coexpression within subcellular and tissue compartments, morphometrics and microenvironment-based measurements, as described previously (20). A total of 1,184 features/biopsy were extracted from the biomarkers and morphology by the software, and summarized as multiple measures, (percentiles, IQR, percent positive, spatial summary statistics) resulting in 13,538 feature/measures per biopsy. The image analysis software was blind to case–control status.
Statistical analyses
A risk prediction classifier was developed within the training set and prospectively defined prior to testing in the validation set. We tested the hypotheses that patients in the predicted low-risk class have significantly lower risk for progression to HGD/EAC than patients in the predicted high-risk class, and that the risk classes would add independent prognostic information beyond that of the pathologic diagnosis and segment length.
Development of risk prediction model.
Univariate conditional logistic regression (CLR) was performed in the training set with the 13,538 feature/measures to compare nonprogressors to progressors and enable feature selection for multivariable model building. Selected features were combined into classifiers and leave-one-out cross-validation (LOOCV) was performed to estimate prognostic performance of the classifiers. In each iteration of LOOCV, 1 case–control group (progressor and matched nonprogressors) was set aside and the remaining case–control groups were used as the training set. The feature/measures were selected and the prediction model was built in the training set by the sum of the features weighted by the univariate Cox coefficients, and then this model was applied on the testing cases to calculate a score. A univariate model was utilized as a multivariate model with many independent variables can suffer from overfitting and thus often capture spurious relationships among independent variables, which may lead to decreased performance when the prediction model is tested in separate cohorts (34). The LOOCV process was repeated until all case–control groups were treated as the testing case once. The end result of LOOCV was a risk score for each patient ranging from 0 to 10. Survival time for Cox proportional hazards regression was defined as the time between the case tested and the diagnosis of HGD/EAC for progressors or last follow-up for nonprogressors. Cox regression was only used after the features had been selected (by CLR) to derive the weights for the selected features to compute a risk score for the prediction model. Concordance-indices (C-indices) were calculated and ROC curves based on the binary outcome of low/high for 5-year risk of progression were plotted. Cutoffs were determined to stratify patients into low-, intermediate-, and high-risk classes. Two cutoffs were chosen in a sequential manner. The first cut-point for separating the low-risk group from the rest was chosen, corresponding to a 95% negative predictive value (NPV) unadjusted for disease prevalence to ensure that the patients scored low-risk had very low risk for progression. The second cut-off point for separating the high-risk group from the rest was chosen, corresponding to a 65% positive predictive value (PPV), unadjusted for disease prevalence, to ensure that the majority of the progressors would be captured by the high-risk group. Kaplan–Meier (KM) curves were used to represent the probability of progression in the three risk classes. HRs with 95% confidence intervals (CI) were calculated from Cox regression and ORs with 95% CI were calculated from the CLR. Log-rank test was used to assess the equality of probability of progression curves of the risk groups from KM analysis, whereas score test was performed with CLR to examine the significance of association of the risk groups with incidence of HGD/EAC.
Validation of risk prediction model.
The separate validation set was quarantined during the training phase. Sample size calculations indicated that a total of 43 patients were required in the validation set to ensure 80% power to detect a significant difference of 50% in the 5-year risk of progression to HGD/EAC between those classified as high-risk versus low-risk (as was observed in the training set), at 0.05 significance level. All assay parameters were prespecified during the training phase and locked down prior to testing in the validation set. The risk score for each patient was calculated and risk classes were assigned on the basis of the prespecified cutoffs. Prevalence-adjusted NPV and PPV were calculated for the low- and high-risk groups based on previously reported progression rates (35). The intermediate-risk group was not included in calculations of NPV and PPV.
Comparison of risk prediction model versus clinical variables.
Multivariate Cox models were performed to assess whether the test would add independent prognostic information beyond traditional clinical factors. Diagnosis (LGD vs. ND/IND combined), sex (0 = F, 1 = M), and segment length (0 = short, 1 = long) were dichotomized. Percent cells overexpressing p53 [determined by image analysis software, as described previously (20)] and age were evaluated as continuous variables. Patients with missing segment length and/or original generalist diagnosis were excluded from the multivariate Cox models.
Results
Patients
The nested case–control cohort included preprogression samples from 79 (n = 41 in training, n = 38 in validation) Barrett's esophagus patients with ND, IND, or LGD who progressed to HGD or EAC at least 1 year later and 287 samples from matched control patients who did not show progression (n = 142 in training, n = 145 in validation), as summarized in Table 1. Case–control sets from the United States and European institutions were randomly assigned to the training or validation set. The nonprogressor patients had median HGD/EAC-free surveillance time of 5.9 [interquartile range (IQR) 4.5–8.2] and 5.5 years (IQR 4.1–8.5) in the training and validation sets, respectively. The median time-to-progression was 2.9 (IQR 2.3–3.7) and 2.8 (IQR 2.0–4.2) years in the training and validation sets, respectively.
Development of 15-feature classifier in the training set
Slide images of multiplexed biomarker labeling were analyzed by the image analysis software to generate 13,538 feature/measures per biopsy in the training set. A set of 17 image analysis features derived from p53, HER2, Ki-67, K20, COX-2, CD68, HIF1α, p16INK4A, AMACR, CD45RO, and nuclear morphology were selected on the basis of P values from univariate CLR comparing cases versus controls (Table 2). Features derived from CDX2, β-catenin, CD1a, and NF-κB-p65 were not selected because of low ranking based on P values. The FDR for the 17 selected features was 0.025%. Therefore, the likelihood that a feature was selected by random chance was negligible. The top 3, 6, 9, 12, 15, and 17 feature/measures based on P values from CLR were scaled using center and scale parameters derived from the training set. The weighted (by univariate Cox model coefficients) sum was calculated to produce a risk score. Ninety-eight percent of the raw scores ranged from −10 to 10. Scaling was performed as follows:
Biomarker . | Image analysis feature . | P . | P value adjusted by diagnosis . | Coefficient . | Utilized by 15-feature risk classifier . |
---|---|---|---|---|---|
p53 | p53 nuclear sum intensity | 3.81E–05 | 5.81E–05 | −8.04439 | √ |
p53 | p53 nuclear mean intensity | 7.48E–05 | 0.000116779 | 6.358257 | √ |
HER2/neu and K20 | Ratio of mean HER2/neu intensity:mean K20 intensity in nuclei clusters | 0.000155084 | 0.000312461 | 4.547325 | √ |
HER2/neu and K20 | Ratio of 95th quantile HER2/neu intensity:95th quantile K20 intensity in nuclei clusters | 0.000318651 | 0.00064884 | 4.286031 | √ |
COX-2 and CD68 | Coexpression cellular COX2 mean intensity and cellular CD68 mean intensity | 0.00046706 | 0.000911804 | −0.02203 | √ |
p53 | p53 mean intensity in nuclei clusters | 0.000501393 | 0.000498535 | 3.099642 | √ |
p53, p16 and nuclear morphology (solidity) | Nuclear solidity in p53+ p16− cells | 0.000866951 | 0.00126554 | 15.62477 | √ |
CD45RO | CD45RO plasma membrane sum intensity | 0.000873155 | 0.002141764 | −3.76449 | √ |
AMACR | AMACR microenvironment SD | 0.000924663 | 0.001502611 | 0.000789 | √ |
COX2 | COX-2 texture in cytoplasma | 0.001318862 | 0.001909437 | 10.39816 | √ |
HIF1α | HIF1α microenvironment cell mean intensity | 0.001758646 | 0.001040583 | 0.000349 | √ |
HIF1α | HIF1α microenvironment cell moment (product of mean and standard deviation) | 0.002189596 | 0.002420038 | 1.02E−06 | √ |
p16 | p16 cytoplasm mean intensity | 0.00352243 | 0.003240205 | −4.98699 | √ |
p53, p16, and nuclear morphology (area) | Nuclear area in p53+ p16− cells | 0.00386791 | 0.005517011 | 0.014368 | √ |
Nuclear morphology | Hoechst nuclear 95th quantile intensity | 0.022822291 | 0.03429373 | 10.78732 | √ |
Ki-67 and K20 | Ratio of 95th quantile Ki-67 intensity:95th quantile K20 intensity in nuclei clusters | 0.279157 | 0.340456 | 0.95634 | x |
Ki-67 and K20 | 95th quantile Ki-67 intensity in nuclei of metaplastic cells | 0.10297 | 0.053479 | 1.73172 | x |
Biomarker . | Image analysis feature . | P . | P value adjusted by diagnosis . | Coefficient . | Utilized by 15-feature risk classifier . |
---|---|---|---|---|---|
p53 | p53 nuclear sum intensity | 3.81E–05 | 5.81E–05 | −8.04439 | √ |
p53 | p53 nuclear mean intensity | 7.48E–05 | 0.000116779 | 6.358257 | √ |
HER2/neu and K20 | Ratio of mean HER2/neu intensity:mean K20 intensity in nuclei clusters | 0.000155084 | 0.000312461 | 4.547325 | √ |
HER2/neu and K20 | Ratio of 95th quantile HER2/neu intensity:95th quantile K20 intensity in nuclei clusters | 0.000318651 | 0.00064884 | 4.286031 | √ |
COX-2 and CD68 | Coexpression cellular COX2 mean intensity and cellular CD68 mean intensity | 0.00046706 | 0.000911804 | −0.02203 | √ |
p53 | p53 mean intensity in nuclei clusters | 0.000501393 | 0.000498535 | 3.099642 | √ |
p53, p16 and nuclear morphology (solidity) | Nuclear solidity in p53+ p16− cells | 0.000866951 | 0.00126554 | 15.62477 | √ |
CD45RO | CD45RO plasma membrane sum intensity | 0.000873155 | 0.002141764 | −3.76449 | √ |
AMACR | AMACR microenvironment SD | 0.000924663 | 0.001502611 | 0.000789 | √ |
COX2 | COX-2 texture in cytoplasma | 0.001318862 | 0.001909437 | 10.39816 | √ |
HIF1α | HIF1α microenvironment cell mean intensity | 0.001758646 | 0.001040583 | 0.000349 | √ |
HIF1α | HIF1α microenvironment cell moment (product of mean and standard deviation) | 0.002189596 | 0.002420038 | 1.02E−06 | √ |
p16 | p16 cytoplasm mean intensity | 0.00352243 | 0.003240205 | −4.98699 | √ |
p53, p16, and nuclear morphology (area) | Nuclear area in p53+ p16− cells | 0.00386791 | 0.005517011 | 0.014368 | √ |
Nuclear morphology | Hoechst nuclear 95th quantile intensity | 0.022822291 | 0.03429373 | 10.78732 | √ |
Ki-67 and K20 | Ratio of 95th quantile Ki-67 intensity:95th quantile K20 intensity in nuclei clusters | 0.279157 | 0.340456 | 0.95634 | x |
Ki-67 and K20 | 95th quantile Ki-67 intensity in nuclei of metaplastic cells | 0.10297 | 0.053479 | 1.73172 | x |
NOTE: Univariate conditional logistic regression was performed with the 13,538 feature/measures extracted by the image analysis to compare nonprogressors (controls) versus incident progressors (cases) in the training set of Barrett's esophagus patients. The table lists the selected subset of 17 features derived from 10 biomarkers and nuclear morphology that showed significant differences in incident progressors versus nonprogressors. This set of 17 features was entered into the multivariable model building. P values shown are estimated from the conditional logistic regression. Coefficients were derived from Cox proportional hazards regression of each feature/measure.
aContrast textural feature is extracted from a cooccurrence matrix and is a measure of the COX-2 intensity contrast between a pixel and its neighbor over the whole tissue image, as described by Haralick and colleagues (43).
LOOCV from feature/measure selection to predictive model building was conducted to evaluate the performance of the risk classifier in the training set. Using risk scores generated by these classifiers through LOOCV, C-indices for the top 3, 6, 9, 12, 15, and 17 features were 0.674, 0.672, 0.716, 0.755, 0.797, and 0.792, respectively, demonstrating that the top performing model was based on 15 feature/measures. The 15 feature/measures were derived from p53, HER2, K20, COX2, CD68, HIF1α, p16INK4A, AMACR, CD45RO, and nuclear morphology and included multiple features derived from individual biomarkers (Table 2). The correlation among features derived from the same candidate biomarker is summarized in Supplementary Table S2, which demonstrates weak to moderate correlations except for two p53-related features showing relatively larger correlation (cor = 0.82), but not to the extent of being collinear. A flowchart detailing the classifier development is shown in Supplementary Fig. S1. Expression patterns of the biomarkers on which the classifier is based are shown in Supplementary Fig. S2. AUROC for the classifier was 0.872 in patients from all four institutions (Fig. 1A), 0.842 in U.S. patients, and 0.870 in AMC patients, indicating high prognostic accuracy.
Two cutoffs were chosen to produce a 3-tier classifier. KM plots of the 5-year probability of progression to HGD/EAC in patients scored as low-, intermediate-, and high-risk demonstrated that the classifier stratified progressors from nonprogressors in all institutions combined and in United States and AMC patients separately (Fig. 1B–D). HRs were 4.19 (95% CI, 1.52–11.57) for intermediate- versus low-risk and 14.73 (95% CI, 6.55–33.16) for high- versus low-risk. Both the log-rank and score tests showed that the 3 predicted risk classes had different risk for progression to HGD/EAC (P < 0.0001).
The molecular and cellular changes that are captured by the classifier are illustrated in Fig. 2, which compares endoscopy images, H&E-stained biopsy images, and images of multiplexed fluorescence labeling of the 9 protein candidate biomarkers from which the 15 features are derived in a progressor (Fig. 2A–C) and a nonprogressor (Fig. 2D–F). Endoscopy images for both patients showed Barrett's esophagus with no apparent lesions [Fig. 2A (progressor) and D (nonprogressor)]. Biopsies from both patients were confirmed as ND by a gastrointestinal subspecialist [Fig. 2B (progressor) and E (nonprogressor)]. The classifier scored the progressor high-risk due to multiple molecular and cellular changes (Fig. 2C). The nonprogressor was scored low-risk due to absence of high-risk features (Fig. 2F).
In multivariate Cox models in which progression to HGD/EAC was evaluated first in relation to clinical variables alone, then in relation to the predicted risk classes added to the clinical variables, the intermediate-risk and high-risk classes provided prognostic power that was independent of the pathologist's diagnosis (generalist and subspecialist), segment length, age, sex, and percent cells overexpressing p53 (Supplementary Table S3A and S3B). The magnitude of HRs indicated that the predicted risk classes provided stronger prognostic power than the clinical variables (Supplementary Table S3). Similar results were observed when the 15-feature risk score was evaluated as a continuous variable (Supplementary Table S4).
Performance of 15-feature classifier in the separate validation set
All parameters of the test were locked down prior to testing in the separate validation set. The prospectively defined test was then evaluated in the separate validation set of patients (Table 1). ROC analysis showed that the prespecified 15-feature classifier predicted 5-year risk of progression to HGD/EAC with AUROCs of 0.804 in patients from all four institutions (Fig. 3A), 0.860 in U.S. patients and 0.717 in AMC patients. In a post hoc analysis in the validation set, the C-indices for the top 3-, 6-, 9-, 12-, and 15-feature classifiers were 0.69, 0.659, 0.7, 0.758, and 0.772, respectively, confirming that the 15-feature classifier was the top performing risk prediction model in the separate validation set. KM analysis demonstrated that the classifier could distinguish progressors from nonprogressors in the full validation set and in U.S. and AMC patients separately (Fig. 3B–D), validating the classifier performance that was observed in the training set. HRs were 2.45 (95% CI, 0.99–6.07) for the comparison of the intermediate-risk versus low-risk group and 9.42 (95% CI, 4.61–19.24; Fig. 3E), for high-risk versus low-risk (P < 0.0001 for log-rank and score tests). Seventeen of the progressors scored high-risk, 7 scored intermediate-risk, and 14 scored low risk in the separate validation set. Nine of the nonprogressors scored high-risk, 21 scored intermediate-risk, and 115 scored low-risk. The probability of progression to HGD/EAC by 5 years increased continuously as the 15-feature risk score increased (Fig. 3F). Prevalence-adjusted NPV and PPV were 0.98 and 0.26 using reported progression rates (35). The prevalence-adjusted proportions of patients scored low-, intermediate-, and high-risk were 77%, 15%, and 8%, respectively.
Multivariate Cox models evaluating a reduced model with clinical variables only and a full model with the 15-feature classifier added showed that the high-risk class provided prognostic power that was independent of the general and gastrointestinal subspecialist pathologist's diagnosis, segment length, age, gender, and percentage of cells overexpressing p53 in the validation set (Table 3A). The gastrointestinal subspecialist diagnosis showed prognostic power when evaluated alone; however, it was no longer statistically significant when the predicted risk classes were added to the Cox model (Table 3B). The risk classifier and the gastrointestinal subspecialist diagnosis were correlated (χ2, P = 0.002), indicating that both were prognostic. However, attenuation of the prognostic power of the gastrointestinal subspecialist diagnosis whereas the risk classifier remained significant in the multivariate Cox model indicated that the risk classifier had stronger prognostic power. There was no strong correlation between the risk classifier and the other evaluated variables including segment length, age, percentage of cells overexpressing p53 (correlations = 0.22, −0.06, and 0.16, respectively), and gender (χ2, P = 0.36). Similar results were observed when the risk score was evaluated as a continuous variable (Supplementary Table S5).
A. Prognostic performance of risk classes vs. clinical variablesa . | ||
---|---|---|
Variable . | HR (95% CI) . | P . |
Analysis without risk prediction test | ||
General pathologist's Dx (LGD vs. ND/IND) | 1.55 (0.67–3.58) | 0.31 |
Segment length (long vs. short) | 2.53 (1.00–6.42) | 0.05 |
Age | 0.99 (0.96–1.02) | 0.38 |
Gender | 1.47 (0.51–4.29) | 0.48 |
p53 (% cells overexpressing)b | 6.87 (0.01–4755.13) | 0.56 |
Analysis with risk prediction test | ||
General pathologist's Dx (LGD vs. ND/IND) | 1.27 (0.53–3.01) | 0.59 |
Segment length (long vs. short) | 1.91 (0.75–4.87) | 0.17 |
Age | 0.99 (0.96–1.02) | 0.4 |
Gender | 1.01 (0.34–3.05) | 0.98 |
p53 (% cells overexpressing) | 0.6 (0–728.87) | 0.89 |
Risk classes (predicted by the test) | ||
Intermediate vs. low risk | 2.11 (0.66–6.7) | 0.21 |
High vs. low risk | 7.27 (3.2–16.49) | <0.0001 |
B. Prognostic performance of risk classes vs. GI subspecialistc | ||
Variable | HR (95% CI) | P |
Analysis without risk prediction test | ||
GI subspecialist pathologist's Dx (LGD vs. ND/IND) | 3.19 (1.24–8.2) | 0.02 |
Analysis with risk prediction test | ||
GI subspecialist pathologist's Dx (LGD vs. ND/IND) | 1.33 (0.5–3.53) | 0.57 |
Risk classes (predicted by the test) | ||
Intermediate vs. low risk | 2.37 (0.95–5.93) | 0.07 |
High vs. low risk | 8.95 (4.27–18.77) | <0.0001 |
A. Prognostic performance of risk classes vs. clinical variablesa . | ||
---|---|---|
Variable . | HR (95% CI) . | P . |
Analysis without risk prediction test | ||
General pathologist's Dx (LGD vs. ND/IND) | 1.55 (0.67–3.58) | 0.31 |
Segment length (long vs. short) | 2.53 (1.00–6.42) | 0.05 |
Age | 0.99 (0.96–1.02) | 0.38 |
Gender | 1.47 (0.51–4.29) | 0.48 |
p53 (% cells overexpressing)b | 6.87 (0.01–4755.13) | 0.56 |
Analysis with risk prediction test | ||
General pathologist's Dx (LGD vs. ND/IND) | 1.27 (0.53–3.01) | 0.59 |
Segment length (long vs. short) | 1.91 (0.75–4.87) | 0.17 |
Age | 0.99 (0.96–1.02) | 0.4 |
Gender | 1.01 (0.34–3.05) | 0.98 |
p53 (% cells overexpressing) | 0.6 (0–728.87) | 0.89 |
Risk classes (predicted by the test) | ||
Intermediate vs. low risk | 2.11 (0.66–6.7) | 0.21 |
High vs. low risk | 7.27 (3.2–16.49) | <0.0001 |
B. Prognostic performance of risk classes vs. GI subspecialistc | ||
Variable | HR (95% CI) | P |
Analysis without risk prediction test | ||
GI subspecialist pathologist's Dx (LGD vs. ND/IND) | 3.19 (1.24–8.2) | 0.02 |
Analysis with risk prediction test | ||
GI subspecialist pathologist's Dx (LGD vs. ND/IND) | 1.33 (0.5–3.53) | 0.57 |
Risk classes (predicted by the test) | ||
Intermediate vs. low risk | 2.37 (0.95–5.93) | 0.07 |
High vs. low risk | 8.95 (4.27–18.77) | <0.0001 |
NOTE: Multivariate Cox models were run in which progression to HGD/EAC was evaluated first in relation to clinical variables alone, then in relation to risk classes and clinical variables in nonprogressor patients and incident progressor patients in the validation set. Pathologist diagnosis, gender, segment length, and risk classes were dichotomized as described in Materials and Methods. Age and p53 were evaluated as continuous variables. Mann–Whitney tests showed no statistically significant difference between age, gender, or segment length in progressors versus nonprogressors (P = 0.72, 0.36, 0.09, respectively).
Abbreviation: GI, gastrointestinal.
an = 30 incident progressor patients and n = 103 nonprogressor patients with complete data for all evaluated variables.
bCalculated by the image analysis software as described in Materials and Methods.
cn = 38 incident progressor patients and n = 145 nonprogressor patients (all validation set patients) for analysis in part B.
Discussion
Using a nested case–control study design, we developed and validated a novel multivariable test that predicts future risk of progression to HGD/EAC in Barrett's esophagus patients. The test produces a risk score that can be used as a predictor to estimate 5-year risk for progression to HGD/EAC. The test incorporates 3-tier stratification to classify patients as low-, intermediate-, or high-risk for progression. The predicted high-risk group was at 9.4-fold increased risk of developing HGD/EAC compared with the low-risk group. Furthermore, the risk classes provided independent predictive information that outperformed traditional risk factors in this study, including the general pathologist diagnosis and the expert gastrointestinal diagnosis. Importantly, the test demonstrated risk stratification that was independent of traditional clinical variables in a separate validation cohort of Barrett's esophagus patients.
The test has the potential to complement histologic analysis and offer providers and patients an individualized risk score for progression. The 3-tier classifier identifies patients at very low risk of progression within 5 years, as demonstrated by the prevalence-adjusted NPV of 0.98 in the validation set. If further validated, this finding suggests that the frequency of surveillance in this patient group can potentially be extended to 5 years. The classifier also identifies patients at very high risk of progression, with prevalence-adjusted PPV estimated at 0.26, demonstrating high predictive performance considering the very low frequency of progression in Barrett's esophagus. Current clinical guidelines recommend endoscopic ablative therapy for confirmed HGD (8) and there is growing evidence to support ablative therapy for confirmed LGD (36, 37). Validation of this risk prediction approach provides potential support to extend ablative therapy to Barrett's esophagus patients with IND and ND with high-risk scores. As with any clinical test, the predictive accuracy will depend on disease prevalence, which is likely to vary between different clinical settings. The approach described here is quantitative, objective, and in this study outperformed the generalist and subspecialist diagnosis, which are both prone to interobserver variation. While our approach would initially add cost, there is the compensatory potential to lower future costs due to reduced surveillance frequency in low-risk patients, and early treatment in high-risk patients.
The limitations of this study include the retrospective nature and the limited sample size. A larger prospective study would not have been feasible due to the very low prevalence of progression in Barrett's esophagus. The study lacked central pathology review, although all cases were reviewed by a single gastrointestinal subspecialist at each U.S. institution. The cohort included patients in surveillance at multiple centers, which prevented standardization of biopsy fixation and storage protocols. However, the biopsies reflect routine Barrett's esophagus samples. While digital pathology has gained traction, the use of imaging and image analysis in pathology laboratories remains limited. The approach described here could be deployed in a central reference clinical laboratory equipped with the necessary imaging instrumentation, software, and staff skilled in digital pathology. The study design included training on a set of patients from multiple institutions with the rationale that a multiinstitutional set would be more representative of the Barrett's esophagus patient population, and therefore a predictive model built in the training set would be more likely to validate in a separate testing set and more likely to be of utility in the general population. An alternative analysis approach would be to train the classifier in a set of patients from a single institution and validate it in an independent set from a different institution. Future studies will include independent validation in Barrett's esophagus patient cohorts from institutions not included in the development and initial validation of the 15-feature risk score.
Many biomarkers have shown promise for risk prediction in Barrett's esophagus (14–17), including the biomarkers evaluated in this study, however, no biomarkers for risk prediction have been validated or translated into practice to date. The challenges to risk prediction include genetic and nongenetic heterogeneity (38)and the resulting need to assess multiple pathways of carcinogenesis. The role of both epithelial and stromal components in carcinogenesis suggests that a systems biology approach may overcome prior study limitations (19). While p53 IHC has diagnostic value (18), it is not sufficient as a single biomarker for risk prediction (15, 24). The methods used to evaluate biomarkers have also hindered implementation. Traditional pathology methods are limited by the difficulties in managing multiple IHC tests on limited biopsies and the challenges of manually integrating multivariable data into a prognosis. The testing approach validated in this study aids pathology by objectively measuring multiple molecular and cellular abnormalities that can precede the morphologic changes. The risk score identifies high-risk Barrett's esophagus patients as having loss of tumor suppression and cell-cycle control, stromal angiogenesis, altered patterns of infiltrating lymphocytes, increased inflammation, and morphology abnormalities, which are early indicators of progression. The classifier utilizes multiple image analysis features extracted from the same biomarker to capture multiple expression patterns (Table 2). For example, p53 is frequently mutated in Barrett's esophagus and while some mutations lead to p53 protein accumulation, others lead to loss (39). By assessing multiple different features, the classifier aims to quantify multiple patterns of biomarker abnormalities in a standardized, objective manner. The classifier also incorporates microenvironment-based features (20) that capture localized abnormalities such as focal AMACR-overexpression and clusters of HIF1α-overexpressing cells. Various molecular approaches have also been applied to diagnostic and prognostic testing in Barrett's esophagus (40–42). While these technologies have aided biomarker discovery and show promise in risk prediction, they have the disadvantage of requiring tissues to be digested, resulting in loss of morphology and spatial relationships, which may be relevant to patient outcomes. Furthermore, specific tests using these genomic technologies have not been independently validated in Barrett's esophagus.
In addition to the advantages of the technology platform, this study was strengthened by use of a diverse patient cohort from four institutions. An additional strength was the exclusion of patients with prevalent HGD/EAC, enabling development of a test that predicts incident progression. Furthermore, the test was validated on a separate set of Barrett's esophagus patients. The assay can be performed on sections from FFPE blocks.
In summary, the tissue systems pathology approach validated in this study quantifies multiple epithelial and stromal processes and better predicted risk of progression to HGD/EAC in Barrett's esophagus patients than clinical variables, including pathologic diagnosis in this study. This approach provides opportunity to improve upon current qualitative histology as a quantitative method to risk stratify patients with Barrett's esophagus.
Disclosure of Potential Conflicts of Interest
R.J. Critchley-Thorne has ownership interest (including patents) in Cernostics, Inc. (inventor on a patent application for the TissueCypher technology used in this study). J.W. Prichard is a consultant/advisory board member for Cernostics, Inc. and has primary employment at Geisinger Health System, which has a financial interest in Cernostics, Inc., the commercial entity that developed the proprietary TissueCypher technology used in this study. B.B. Campbell is formerly a full-time employee of Cernostics, Inc. and has ownership interest (including patents) in patent for the TissueCypher technology used in this study. Y. Zhang is a consultant/advisory board member for Cernostics, Inc. K.A. Repa is formerly a full-time employee of Cernostics, Inc. L.M. Reese is a scientist at Cernostics, Inc. and has ownership interest (including patents) in Cernostics, Inc. J. Li and D. L. Diehl have primary employment at Geisinger Health System, which has a financial interest in Cernostics, Inc., the commercial entity that developed the proprietary TissueCypher technology used in this study. D. Lansing Taylor is an advisor at Cernostics, Inc., has ownership interest (including patents) in Cernostics, Inc., and is a consultant/advisory board member for Cernostics, Inc. A.K. Rustgi is a non-paid member of the advisory board for Cernostics, Inc. and has been provided stock with no value. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: R.J. Critchley-Thorne, J.W. Prichard, B.A. Jobe, J. Li, A.K. Rustgi
Development of methodology: R.J. Critchley-Thorne, B.B. Campbell, K.A. Repa, J. Li
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R.J. Critchley-Thorne, L.C. Duits, J.W. Prichard, J.M. Davison, B.B. Campbell, L.M. Reese, J. Li, D.L. Diehl, N.C. Jhala, M. DeMarshall, T. Foxwell, A.H. Zaidi, J.J.G.H.M. Bergman, G.W. Falk
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): R.J. Critchley-Thorne, L.C. Duits, B.A. Jobe, B.B. Campbell, Y. Zhang, L.M. Reese, J.J.G.H.M. Bergman
Writing, review, and/or revision of the manuscript: R.J. Critchley-Thorne, L.C. Duits, J.W. Prichard, J.M. Davison, B.A. Jobe, Y. Zhang, L.M. Reese, J. Li, D.L. Diehl, N.C. Jhala, G. Ginsberg, A.H. Zaidi, D.L. Taylor, A.K. Rustgi, J.J.G.H.M. Bergman, G.W. Falk
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): R.J. Critchley-Thorne, L.C. Duits, J.M. Davison, B.B. Campbell, K.A. Repa, L.M. Reese, M. DeMarshall, A.K. Rustgi
Study supervision: R.J. Critchley-Thorne
Other (obtained grant funding): R.J. Critchley-Thorne
Acknowledgments
The authors thank Georgiann Noll and Matthew Barley (Geisinger) and Xuan Mai Nguyen for technical assistance (Cernostics) and David Rimm and Janet Warrington for many helpful discussions and advice on the research study.
Grant Support
This work was supported by a grant from the Pennsylvania Department of Health Cure Program Grant, Research on Cancer Diagnostics or Therapeutics with Commercialization Potential RFA#10-07-03 (to R.J. Critchley-Thorne, J.M. Davison, G.W. Falk, J.W. Prichard, Y. Zhang), a Qualifying Therapeutic Discovery Project Grant, Internal Revenue Service/Affordable Care Act 2010 (to R.J. Critchley-Thorne), and by Cernostics, Inc. Partial support by NIH/NCI U54-CA163004 (to A.K. Rustgi and G.W. Falk) and the Molecular Pathology and Imaging and Molecular Biology Core Facilities at University of Pennsylvania.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.