Abstract
Blood-based biomarkers for gastric cancer risk stratification could facilitate targeting screening to people who will benefit from it most. The ABC Method, which stratifies individuals by their Helicobacter pylori infection and serum-diagnosed chronic atrophic gastritis status, is currently used in Japan for this purpose. Most gastric cancers are caused by chronic H. pylori infection, but few studies have explored the capability of antibody response to H. pylori proteins to predict gastric cancer risk in addition to established predictors.
We used the least absolute shrinkage and selection operator (Lasso) to build a predictive model of noncardia gastric adenocarcinoma risk from serum data on pepsinogen and antibody response to 13 H. pylori antigens as well as demographic and lifestyle factors from a large international study in East Asia.
Our best model had a significantly (P < 0.001) higher AUC of 73.79% [95% confidence interval (CI), 70.86%–76.73%] than the ABC Method (68.75%; 95% CI, 65.91%–71.58%). At 75% specificity, the new model had greater sensitivity than the ABC Method (58.67% vs. 52.68%) as well as NPV (68.24% vs. 66.29%).
Along with serologically defined chronic atrophic gastritis, antibody response to the H. pylori proteins HP 0305, HP 1564, and UreA can improve the prediction of gastric cancer risk.
The new risk stratification model could help target more invasive gastric screening resources to individuals at high risk.
Introduction
Gastric cancer is the fifth most common cancer in the world and the third leading cause of cancer death, with about one million incident cases and 750,000 deaths annually (1, 2). Its incidence is particularly high in East Asia, especially in China, Japan, and Korea (1, 2). More than 90% of diagnosed stomach cancers in East Asia are adenocarcinomas occurring in the gastric glands (3). Five-year survival from gastric cancer is low, but it can often be successfully treated if diagnosed early (4, 5). To more accurately and noninvasively identify who would benefit from gastric cancer screening, Miki and colleagues created the ABC Method for risk stratification in Japan (6, 7). The ABC Method combines two serum assays: one for infection with Helicobacter pylori (H. pylori), and another for pepsinogen-defined chronic atrophic gastritis (CAG). Helicobacter pylori (H. pylori) is a bacterium that lives in the gastric mucosa and has been implicated in gastric cancer carcinogenesis (8, 9). Globally, H. pylori is estimated to infect 50% of the world's population, and it is especially common in East Asia, particularly among older individuals (10, 11). Chronic atrophic gastritis, which usually results from prolonged H. pylori infection, has been associated with gastric cancer incidence (8, 12). CAG progression has also been associated with a stepwise reduction in serum pepsinogen levels (13). Pepsinogens are pepsin pro-enzymes that can be measured in the blood to detect changes to the gastric mucosa, including atrophic gastritis (14, 15). The ABC Method defines chronic atrophic gastritis as an absolute serum pepsinogen I concentration ≤70 μg/L and a pepsinogen I: pepsinogen II ratio ≤3.
The ABC Method is limited, however, in that it does not account for differences in risk conferred by seroprevalence to different H. pylori proteins. For example, individuals seropositive for H. pylori who test positive for antibodies to the cytotoxin associated gene A (CagA) antigen in their serum generally have a higher risk of gastric cancer than individuals seropositive for H. pylori but not for CagA (16, 17). Possibly for this reason, the ABC Method's gastric cancer risk predictive capability is relatively poor—a 2019 study performed in a Chinese population at high risk of gastric cancer found that the ABC Method had an AUC of just 52.70% [95% confidence interval (CI), 47.60%–57.90%; ref. 18]. Similarly, it does not account for the underlying continuous association between serum pepsinogen levels and gastric cancer risk, or for variety in pepsinogen thresholds for CAG dependent on H. pylori seropositive status (19, 20).
In this study, we used supervised machine learning techniques to build a predictive model of gastric cancer risk that incorporated antibody response to H. pylori-specific proteins and continuous coding of serum pepsinogen levels. We used data from a large East Asian nested case–control study of gastric cancer that collected individual-level data on demographic and lifestyle risk factors, serum pepsinogen, and host immune response to H. pylori-specific proteins (21). The model's discrimination capability was assessed using ROCs with special attention to sensitivity rather than specificity. Sensitivity was deemed more important because, at an early and minimally invasive stage of screening, it is more important to identify as many treatable precancers or early cancers as possible. Finally, we internally validated the model using leave-one-out cross-validation (LOOCV).
Materials and Methods
Study population
This study uses data from three cohorts within the Helicobacter pylori Biomarker Cohort Consortium (HpBCC). The total HpBCC is a nested case–control study recruited from eight cohorts across Japan, China, and the Republic of Korea. It was created to find potential biomarkers for H. pylori-related health outcomes including gastric cancer (22, 23). The cohorts collected demographic and lifestyle information as well as blood samples from healthy individuals at baseline (23). The primary outcome of interest was noncardia gastric cancer, defined by International Classification of Diseases for Oncology (including C16.1–C16.6, C16.8, C16.9). Three of these cohorts—the Japan Public Health Center Study (JPHC) I and JPHC II in Japan and the Linxian Nutrition Intervention Trial (NIT) in China—assessed pepsinogen from participants' sera (24, 25). Therefore, we used data from these three cohorts exclusively to build our predictive model.
The JPHC I and II used incidence density sampling to select controls. At the time of diagnosis of each index gastric cancer case, a control was selected randomly from among all currently living members of the cohort from which the case arose who had no history of gastric cancer or gastrectomy. JPHC I and II controls were individually matched to cases based on gender, birth date, and blood collection date. NIT controls were frequency matched to cases based on gender and age.
This study was approved by the institutional review boards of the University of North Carolina at Chapel Hill and Duke University.
H. pylori multiplex serology
Pepsinogen and host response to H. pylori proteins were assessed from 10 mL serum or plasma samples collected at the time of baseline questionnaire administration. Samples were collected during 1985 in the NIT, 1990 to 1992 for the JPHC I, and 1993 to 1995 in the JPHC II. The median follow-up time between blood collection and cancer diagnosis was 7.0 years for the JPHC I, 4.2 years for the JPHC II, and 7.0 years for the NIT (23). In the NIT, samples were frozen and shipped from Linxian to Beijing where they were stored at –80°C (26). In the JPHC I and II, blood samples were centrifuged to obtain plasma and buffy coat layer within 12 hours of blood draw and stored at −80°C (27). Blood samples were assayed for the HpBCC in 2016. Antibody levels to 13 H. pylori proteins were assessed using multiplex serology. Samples were assayed at the German Cancer Research Center (DKFZ). H. pylori multiplex serology is based on a glutathione S-transferase capture immunosorbent assay combined with fluorescent bead technology (Luminex) that detects human antibodies to 13 H. pylori recombinantly expressed fusion proteins (UreA, Catalase, GroEL, NapA, CagA, HP0231, VacA, HpaA, Cad, HyuA, HP1564, HcpC, and HP0305; refs. 28, 29). Antigen-specific median reporter fluorescence intensity (MFI) cutoff points were calculated using 17 H. pylori-negative sera that had previously been classified for H. pylori status. H. pylori seropositivity was defined as reactivity with at least four proteins, which has shown good agreement (κ = 0.70) with commercial serological assay. This assay has 89% sensitivity and 82% specificity (28). A total of 24 QC samples were included to test the assay's reliability. The determination of seropositivity for all H. pylori proteins detected was highly consistent, with identical results for 98% (353/360) of the tests (23).
Pepsinogen assays
In the NIT, serum pepsinogen I and pepsinogen II were measured using ELISAs (Biohit ELISA Kit), which were performed by technicians who were blinded to the subjects' case/control status (19). Two serum samples were taken per individual, and the average of these two was determined to be the final pepsinogen level (19). Among five participants, large differences were found between the two samples. For these, a third assay was conducted and the average of the two closer results was recorded as the final pepsinogen level. Correlation between duplicate samples was very high: the Pearson correlation coefficient was 0.995 for pepsinogen I and 0.997 for pepsinogen II (19). Each assayed plate also included two quality control (QC) samples (19). Using all QC samples together, the coefficients of variation for pepsinogen I and pepsinogen II assays were 6.5% and 2.7%, respectively (19). An additional 103 QC samples, aliquoted from a single large serum pool from the National Cancer Hospital in Beijing, were distributed among all assay plates. Considering all of these QC samples together, the coefficients of variation for pepsinogen I and pepsinogen II were 5.5% and 6.7%, respectively (19).
In the JPHC I and II, pepsinogen I and pepsinogen II levels were assessed from serum in a two-step enzyme immunoassay using commercial kits (E Plate “Eiken” Pepsinogen I, Eiken Kagaku) and (E Plate “Eiken” Pepsinogen II, Eiken Kagaku; ref. 27).
Statistical analysis
Person-time was calculated as the difference between the last date of follow-up and the start date of each cohort (March 1, 1985, for NIT; January 1, 1990, for JPHC I; and January 1, 1993, for JPHC II; refs. 30, 31). For cases, the last date of follow-up was their date of diagnosis with gastric noncardia adenocarcinoma. For JPHC I and II controls, the last follow-up date was their matched case's date of diagnosis. NIT controls, who were not individually matched to cases, had for their last date of follow-up the maximum of the date of administrative censoring, date of death, or date of loss to follow-up. Dates of diagnosis for cases were accurate to the day, but many controls had only the year of loss to follow-up recorded. These controls' last dates were therefore recorded as January 1 of their last year of follow-up. The Breslow method was used to manage ties (32). The survival package in R was used to estimate HRs and 95% CIs (33).
We used the least absolute shrinkage and selection operator (Lasso) to build predictive models (34). Lasso is a parameter regularization technique that selects features by fitting a risk model under the constraint that the sum of the absolute values of the regression coefficients cannot exceed a predetermined threshold: the tuning parameter, “λ” (34–36). When applied to the set of all covariates, this threshold excludes covariates that contribute least to predicting the outcome. The remaining covariates are thus selected as predictors of gastric cancer. In total, 37 variables were evaluated as potential features of the gastric cancer classification model. Continuous variables including age, pack-years of cigarette smoking, serum pepsinogen I, serum pepsinogen II, the ratio of serum pepsinogen I/II, and the 13 antibody response to H. pylori protein variables were evaluated by Lasso in straight line, monotonic functional form. Because the H. pylori MFI values are prone to statistical noise at the lower level, these continuous variables were recoded as 0 if they were below that antibody response variable's predefined threshold for seroprevalence. Binary coded forms for the 13 H. pylori protein variables were also assessed as potential predictors. Gender, smoking status (current vs. former or never), H. pylori seropositivity, and family history of gastric cancer were coded as binary variables. Seropositivity to multiple H. pylori proteins (range: 0 to 13) was assessed as a 14-level categorical variable. A binary variable for serologically defined atrophic gastritis was also included (gastritis present = pepsinogen I ≤ 70 and pepsinogen I/II ratio ≤ 3; not present otherwise).
Ten-fold cross-validation was used to determine the tuning parameter: the HpBCC data set was split into 10 subsets, then the classification model was fit to nine subsets (together comprising the training set) and the model's classification capability was assessed by Harrell's concordance index (c-index) in the omitted subset (the test set; refs. 37–39). This procedure was repeated 10 times, omitting a different subset each time. Splitting the data into training and test sets safeguards Lasso from overfitting a model to the training data (36). We chose the λ value that gave the model with the fewest predictors whose c-index was within one standard error of that of the model with the largest c-index (40–43). We deemed this model more appropriate than the model with the largest c-index because (i) smaller models are less likely to be overfit to the training data and (ii) a smaller model might shed more light on which risk factors are most strongly associated with gastric cancer.
Weighting
Incidence density sampling was used to select controls into the HpBCC from the underlying cohorts, except the NIT which was subsampled from a sex- and age-stratified case–cohort study (23, 44). One control was sampled whenever a gastric cancer case was diagnosed. This means that controls can be selected into the nested case–control study more than once, provided that they are still at risk of the disease (45). From the HR, we can estimate the baseline hazard function and then the risk of gastric cancer in the cohort (45). However, this estimation is complicated by matching; controls were matched to cases by cohort, gender, and age. This will bias the estimate of the baseline hazard (and thus the risk), because certain individuals were more likely to be sampled as controls than others based on their matching characteristics. To account for this potential selection bias, we implemented time-varying inverse probability of sampling weights using the method from Salim and colleagues (45). More detail on weight calculation is included in the Supplementary Materials and Methods.
Model performance analysis
We assessed the classifier's ability to discriminate between gastric cancer cases and controls by calculating the area (AUC) under the ROC curve at 10 years of follow-up within the training data set (41). In addition, we report sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). PPV and NPV were calculated using the prevalence of gastric cancer in the underlying cohorts and within relevant strata for stratified analyses. The timeROC package was used for constructing time-dependent ROC curves and their respective AUCs, sensitivity, and specificity (46).
Assessing discrimination within the training data usually gives overly optimistic estimates of its ability to predict the case or noncase status of observations it has not seen before. Therefore, we internally validated the model using LOOCV (39, 47). A detailed description of the strengths and weaknesses of LOOCV is included in the Supplementary Materials and Methods.
We also compared the Lasso model's performance to that of the ABC Method, an established risk stratification model for gastric cancer currently in use in Japan and the Republic of Korea (7, 48). The ABC Method classifies individuals via two binary variables: H. pylori infection status and serologically defined chronic atrophic gastritis. Its four levels, in ascending order of gastric cancer risk, are: A (H. pylori–, CAG–), B (H. pylori +, CAG–), C (H. pylori +, CAG+), and D (H. pylori–, CAG+). Owing to small numbers in the D category, and following previous research that found little difference in risk of gastric cancer between the two highest groups, we decided to code the ABC Method variable with only three categories (A, B, and C + D; refs. 49, 50).
All analyses were conducted in R statistical software, version 4.0.2 (51).
Data availability
The data analyzed in this study are available from Duke University. Restrictions apply to the availability of these data, which were used under specific agreement for this study. Data are available from the authors upon reasonable request with the permission of both Duke University and the respective institutions that collected the specimens (the National Cancer Center Research Institute, Japan, the Cancer Institute, Chinese Academy of Medical Sciences, China, and the National Cancer Institute, USA).
Results
Descriptive statistics
There were 708 cases and 714 controls in the analysis set (Table 1). Cases tended to be slightly older than controls (mean age: 57 vs. 55). About 45% of participants were recruited from the NIT, and 55% from the JPHC I and II. Cases were more likely than controls to test positive for H. pylori seropositivity (92% vs. 79%). The median serum pepsinogen I level was lower among cases than among controls (42 μg/L vs. 46 μg/L), as was the median ratio of serum pepsinogen I/II (2.9 vs. 4.5). Serologically diagnosed chronic atrophic gastritis was more common among cases than among controls (49% vs. 33%). The prevalence of current smoking and family history of gastric cancer was similar in both groups. The gender distribution was the same for cases and controls (about 66% men, 33% women) due to matching controls to cases by gender. As noted above, the age distribution, however, differed somewhat between groups. This is probably because controls from the NIT cohort were only matched to cases by 10-year age strata (i.e., 40–50, 50–60, and 60–70 years).
Variable . | . | Case . | Control . |
---|---|---|---|
n . | . | 708 . | 714 . |
Gender (%) | Women | 241 (34.0) | 240 (33.6) |
Men | 467 (66.0) | 474 (66.4) | |
Age category, years (%) | ≤40 | 16 (2.3) | 53 (7.4) |
41–45 | 48 (6.8) | 78 (10.9) | |
46–50 | 87 (12.3) | 89 (12.5) | |
51–55 | 142 (20.1) | 135 (18.9) | |
56–60 | 181 (25.6) | 156 (21.8) | |
>60 | 234 (33.1) | 203 (28.4) | |
Study (%) | JPHC I | 205 (29.0) | 201 (28.2) |
JPHC II | 192 (27.1) | 191 (26.8) | |
NIT | 311 (43.9) | 322 (45.1) | |
H. pylori statusa (%) | H. pylori positive | 651 (91.9) | 561 (78.6) |
Serum pepsinogen I (μg/L) median (IQR) | 41.80 (21.78, 109.59) | 46.35 (24.83, 118.21) | |
Median pepsinogen I/II ratio median (IQR) | 2.93 (1.77, 6.04) | 4.49 (2.44, 8.42) | |
Chronic atrophic gastritisb (%) | Chronic atrophic gastritis | 350 (49.4) | 235 (32.9) |
Smoker (%) | Current smoker | 77 (17.9) | 86 (19.2) |
Family history (%) | Family history of gastric cancer | 58 (8.2) | 48 (6.7) |
Variable . | . | Case . | Control . |
---|---|---|---|
n . | . | 708 . | 714 . |
Gender (%) | Women | 241 (34.0) | 240 (33.6) |
Men | 467 (66.0) | 474 (66.4) | |
Age category, years (%) | ≤40 | 16 (2.3) | 53 (7.4) |
41–45 | 48 (6.8) | 78 (10.9) | |
46–50 | 87 (12.3) | 89 (12.5) | |
51–55 | 142 (20.1) | 135 (18.9) | |
56–60 | 181 (25.6) | 156 (21.8) | |
>60 | 234 (33.1) | 203 (28.4) | |
Study (%) | JPHC I | 205 (29.0) | 201 (28.2) |
JPHC II | 192 (27.1) | 191 (26.8) | |
NIT | 311 (43.9) | 322 (45.1) | |
H. pylori statusa (%) | H. pylori positive | 651 (91.9) | 561 (78.6) |
Serum pepsinogen I (μg/L) median (IQR) | 41.80 (21.78, 109.59) | 46.35 (24.83, 118.21) | |
Median pepsinogen I/II ratio median (IQR) | 2.93 (1.77, 6.04) | 4.49 (2.44, 8.42) | |
Chronic atrophic gastritisb (%) | Chronic atrophic gastritis | 350 (49.4) | 235 (32.9) |
Smoker (%) | Current smoker | 77 (17.9) | 86 (19.2) |
Family history (%) | Family history of gastric cancer | 58 (8.2) | 48 (6.7) |
aDefined as seropositive to ≥4 H. pylori antigens from 13-plex.
bDefined as pepsinogen I ≤70 and pepsinogen I/II ratio ≤3.
We calculated crude HRs for each potential predictor to describe their association with gastric cancer (Supplementary Table S1). Individuals seropositive for H. pylori were more likely to have gastric cancer than individuals without (HR, 3.34; 95% CI, 2.37–4.71). All antibody response variables were associated with an increased hazard of gastric cancer. Stronger associations were observed for HP 1564 (HR, 3.02; 95% CI, 2.23–4.07), CagA (HR, 2.96; 95% CI, 2.20–3.97), and VacA (2.79; 95% CI, 2.10–3.72; Supplementary Table S1). Regarding the ABC Method, members of Groups B and C+D had a much higher hazard of gastric cancer than members of Group A (Supplementary Table S1). Median MFI values for host response to the 13 considered proteins are reported in Supplementary Table S2. In addition, visualizations of univariate associations between log odds of gastric cancer and each continuous variable considered for inclusion in the Lasso model are reported in the Supplementary Materials and Methods.
Results from Lasso model building
Out of 37 potential features, Lasso selected six for classifying gastric cancer cases and controls in the new Lasso model at the prespecified λ.1se threshold (Supplementary Fig. S1). Two features were demographic: gender (binary), age (linear; centered at 57 years). Three were host response to H. pylori biomarkers: UreA (linear), HP 0305 (binary), and HP 1564 (binary). The remaining one was a pepsinogen variable: CAG (binary). The equation for the Lasso risk prediction model is included in the supplement (Supplementary Materials and Methods, page 2).
Table 2 gives the Lasso-developed model's coefficient values at 10 years of follow-up. The strongest associations with gastric cancer were found with serologically defined CAG (HR, 3.11), HP 1564 status (HR, 1.85), and HP 0305 (HR, 1.61).
Parameter . | Coefficient (SE) . | HR (95% CI) . |
---|---|---|
Gender (men; binary) | 0.428 (0.081) | 1.53 (1.18–1.99) |
Agea | 0.039 (0.005) | 1.04 (1.03–1.05) |
UreA (linear) | 0.00005 (0.00002) | 1.00 (1.00–1.00) |
HP 0305 (binary) | 0.473 (0.090) | 1.61 (1.21–2.12) |
HP 1564 (binary) | 0.616 (0.126) | 1.85 (1.28–2.68) |
CAGb (binary) | 1.31 (0.084) | 3.11 (2.35–4.12) |
Parameter . | Coefficient (SE) . | HR (95% CI) . |
---|---|---|
Gender (men; binary) | 0.428 (0.081) | 1.53 (1.18–1.99) |
Agea | 0.039 (0.005) | 1.04 (1.03–1.05) |
UreA (linear) | 0.00005 (0.00002) | 1.00 (1.00–1.00) |
HP 0305 (binary) | 0.473 (0.090) | 1.61 (1.21–2.12) |
HP 1564 (binary) | 0.616 (0.126) | 1.85 (1.28–2.68) |
CAGb (binary) | 1.31 (0.084) | 3.11 (2.35–4.12) |
aCentered at 57 years (median age in the data set).
bDefined as pepsinogen I ≤70 μg/L and pepsinogen I/II ratio ≤3.
We compared the Lasso model and ABC Method's ability to discriminate between gastric cancer cases and controls within the whole data set by plotting ROC curves and reporting the AUC, sensitivity, specificity, positive predictive value, and negative predictive value at 10 years of follow-up (Table 3). The threshold for calculating sensitivity and specificity was set at a risk score >1.0, which was the median risk score generated by the Lasso model (the median ABC Method risk score was 0.8). An individual's risk score is the exponentiated sum of the β coefficients multiplied by their exposure status for each feature included in the predictive model. The ABC Method had an AUC of 68.75% (95% CI, 65.91–71.58), a sensitivity of 52.68% (95% CI, 48.49%–56.88%), and a specificity of 78.86% (95% CI, 75.37%–82.35%). Compared with the ABC Method, the Lasso model had a higher AUC (73.79%; 95% CI, 70.86%–76.73%; P < 0.001) and sensitivity (73.56%; 95% CI, 69.86%–77.27%) at a lower specificity (60.76%; 95% CI, 56.59%–64.94%). The ABC Method's PPV (67.86%; 95% CI, 63.12%–72.60%) was higher than that of the Lasso model (61.37%; 95% CI, 57.45%–65.29%), although the 95% CIs overlapped. The NPV of the Lasso model (73.06%; 95% CI, 69.22%–76.90%) was higher than that of the ABC Method (66.29%; 95% CI, 62.90%–69.68%), but with slight overlap of the 95% CIs. We also compared the ABC Method and Lasso model at an equal specificity of 75% (Supplementary Table S3). At this level, the Lasso model had a greater sensitivity (58.67% vs. 52.68%) and NPV (68.24% vs. 66.29%) than the ABC level, and a similar but slightly smaller PPV (66.75% vs. 67.85%).
Classification within the whole data set . | ||||||
---|---|---|---|---|---|---|
Model name . | AUCa (%) . | Pb . | Sensitivityc (%) . | Specificity (%) . | PPV (%) . | NPV (%) . |
ABC Methodd | 68.75 (65.91, 71.58) | <0.001 | 52.68 (48.49, 56.88) | 78.86 (75.37, 82.35) | 67.86 (63.12, 72.60) | 66.29 (62.90, 69.68) |
Lasso modele | 73.79 (70.86, 76.73) | 73.56 (69.86, 77.27) | 60.76 (56.59, 64.94) | 61.37 (57.45, 65.29) | 73.06 (69.22, 76.90) | |
Prediction from LOOCV | ||||||
ABC Methodd | 56.60 (52.90, 60.29) | N/A | 52.68 (48.49, 56.88) | 78.86 (75.37, 82.35) | 67.86 (63.12, 72.60) | 66.29 (62.90, 69.68) |
Lasso modele | 73.37 (70.42, 76.32) | 73.56 (69.86, 77.27) | 60.38 (56.19, 64.58) | 61.14 (57.22, 65.06) | 72.94 (69.10, 76.78) |
Classification within the whole data set . | ||||||
---|---|---|---|---|---|---|
Model name . | AUCa (%) . | Pb . | Sensitivityc (%) . | Specificity (%) . | PPV (%) . | NPV (%) . |
ABC Methodd | 68.75 (65.91, 71.58) | <0.001 | 52.68 (48.49, 56.88) | 78.86 (75.37, 82.35) | 67.86 (63.12, 72.60) | 66.29 (62.90, 69.68) |
Lasso modele | 73.79 (70.86, 76.73) | 73.56 (69.86, 77.27) | 60.76 (56.59, 64.94) | 61.37 (57.45, 65.29) | 73.06 (69.22, 76.90) | |
Prediction from LOOCV | ||||||
ABC Methodd | 56.60 (52.90, 60.29) | N/A | 52.68 (48.49, 56.88) | 78.86 (75.37, 82.35) | 67.86 (63.12, 72.60) | 66.29 (62.90, 69.68) |
Lasso modele | 73.37 (70.42, 76.32) | 73.56 (69.86, 77.27) | 60.38 (56.19, 64.58) | 61.14 (57.22, 65.06) | 72.94 (69.10, 76.78) |
aArea under the receiver–operator characteristic curve. Time-dependent results reported at 10 years of follow-up.
bTesting the null hypothesis that |AUCLasso – AUCABC| = 0 at 10 years of follow-up. No P value for prediction AUCs because they result from different data sets due to LOOCV.
cThreshold at gastric cancer risk score >1.
dThree levels: A [H. pylori–, chronic atrophic gastritis (CAG)–], B [H. pylori+, CAG–, C (H. pylori +, CAG+; or H. pylori–, CAG+)].
eSix predictors: gender (binary), age (continuous; centered at 57 years), UreA (continuous), HP 0305 (binary), HP 1564 (binary), serologically defined CAG (binary).
Figure 1 shows the two classification ROC curves for the Lasso and ABC Method models respectively, with points indicating the threshold of risk score >1.0. The ABC Method ROC curve has a trapezoidal shape because it can only assign one of three predicted probabilities to each individual: one for those in Group A, one for those in Group B, and one for those in Group C+D. The same discrimination patterns were observed when comparing the two models at 5 years of follow-up (Supplementary Table S4). For further comparison, we assessed the discrimination capability of the ABC Method plus the variables selected into the Lasso model (Supplementary Tables S3 and S4). This combined model performed almost identically to the Lasso model; the DeLong's test of difference in AUC between these two models was not significant at 5 years of follow-up (P = 0.46) or at 10 years of follow-up (P = 0.37). Finally, the result of a likelihood ratio test of the Lasso model compared with the Lasso model plus the ABC Method variable was not statistically significant (P = 0.85).
To assess how well the Lasso model may classify data it has not seen before, we used LOOCV. Under LOOCV, the ABC Method had an AUC of 56.60 (95% CI, 52.90%–60.29%), a sensitivity of 52.68% (95% CI, 48.49%–56.88%), and a specificity of 78.86% (95% CI, 75.37%–82.35%; Table 3). The Lasso model's AUC under LOOCV was 73.37% (95% CI, 70.42%–76.32%), its sensitivity was 73.56% (95% CI, 69.86%–77.27%), and its specificity was 60.38% (95% CI, 56.19%–64.58%). The PPV and NPV for both models were very similar to their values in the training data. Figure 2 shows the two prediction ROC curves for the Lasso and ABC Method models respectively, with points indicating the risk score >1.0 threshold. The ABC Method ROC curve in Supplementary Fig. S2 has a peculiar shape due to LOOCV: the training data changed slightly every time a different individual was left out as the test set. Thus, individuals within the same level of the ABC Method received slightly different (usually in the fifth decimal place) predicted probabilities of gastric cancer. In reality, there are only three possible gastric cancer predicted probability values for the ABC Method, as in Fig. 1.
We also evaluated the classification ability of the Lasso model determined from the whole HpBCC data set stratified by gender (men vs. women) and study site (JPHC I, JPHC II, vs. NIT). The Lasso model had a significantly higher AUC than the ABC Method within all strata except the JPHC I under classification within the training data (Supplementary Table S5). Under LOOCV, the Lasso model also had a higher AUC than the ABC Method within all strata (Supplementary Table S6). The ABC Method had very high sensitivity within the NIT data (88.34%; 95% CI, 84.13%–92.55%) compared with the Lasso model (81.17%; 95% CI, 76.03%–86.30%). Within the JPHC I data, the Lasso model had a higher sensitivity (83.93%; 95% CI, 77.81%–90.04%) than the ABC Method (76.23%; 95% CI, 69.06%–83.41%). The Lasso model's AUC was more than twice that of the ABC Method in the NIT and JPHC I strata, and 13% greater in the JPHC II stratum (Supplementary Table S5). In the JPHC II stratum, the ABC Method had a near-perfect sensitivity at 12.50% specificity, whereas that of the Lasso model was 86.85% (95% CI, 81.42%–92.28%) at 25.00% specificity. However, the JPHC II statistics were imprecise due to the small sample size and the fact that all controls had been censored before 10 years of follow-up had been reached. Therefore, we reported discrimination statistics at 9 years of follow-up for the JPHC II. The ABC Method had a higher PPV than the Lasso model within gender strata but the relationship was different among cohort strata: in the NIT, the Lasso model had a higher PPV than the ABC Method, and the PPV estimates within the JPHC I and II were very similar for both models (Supplementary Table S5). The Lasso model's NPV was higher than that of the ABC Method for gender strata and within the JPHC I. However, within the JPHC II, the ABC Method had a considerably higher NPV than the Lasso model (although the 95% CIs were very wide). The NPV for both models was very similar within the NIT. When the specificity was held constant at 75%, the Lasso model outperformed the ABC Method in sensitivity and NPV within all gender and study site strata (Supplementary Table S7). These associations persisted under LOOCV (Supplementary Table S8).
The Lasso model was also assessed within strata of CAG status. It exhibited a higher AUC among CAG+ individuals (71.29%; 95% CI, 66.03%–76.54%) than CAG– individuals (65.80%; 95% CI, 61.62%–69.98%; Supplementary Table S9). The same association was maintained under LOOCV: the AUC among CAG+ individuals was 70.65% (95% CI, 65.36%–75.93%) whereas it was 64.83% (95% CI, 60.62%–69.04%) among CAG– individuals (Supplementary Table S10).
Sensitivity analyses
The association between continuous MFI level for H. pylori antibody response variables, such as UreA in our Lasso model, has been less consistently established than that of binary variables such as HP 0305 and HP 1564 (28). Therefore, as a sensitivity analysis, we removed UreA from the Lasso model and reassessed its discrimination statistics. The model without UreA had a classification AUC of 73.41% (95% CI, 70.46%–76.37%), which was higher than the AUC for the model with UreA (Table 3). At 60.00% specificity, the model without UreA had a sensitivity of 74.86% (95% CI, 71.21–78.50). Under LOOCV, the model without UreA maintained the same AUC of 73.41% (95% CI, 70.46%–76.37%) as it achieved within the whole data set but had a slightly lower sensitivity of 74.69% (95% CI, 71.05%–78.34%) and lower specificity of 59.43% (95% CI, 55.23%–63.62%) at the risk score threshold of 1.0.
In this data set, there were 17 individuals who were seronegative for H. pylori (meaning they were seropositive to <4 H. pylori proteins) but seropositive for chronic atrophic gastritis. Such individuals have been found to have an extremely high risk of stomach cancer (7). It is possible that these people had a previous H. pylori infection which caused such severe gastric atrophy that it rendered the stomach inhospitable for the bacteria (7). Thus, labeling these people as H. pylori seronegative may have been a misclassification error in that they were previously infected with H. pylori, which highlights the limit of serology in capturing exposure history. Therefore, we conducted a sensitivity analysis in which we excluded these 17 observations from the data set before applying Lasso to build the predictive model. In addition, previous research has suggested that using CagA seropositivity in addition to H. pylori seropositivity may classify H. pylori infection status more accurately than H. pylori seropositivity alone (27). We explored this by redefining H. pylori seropositivity as: positive antibody response to CagA and at least three additional H. pylori proteins. Individuals were classified as seronegative for H. pylori infection if they were neither seropositive for CagA nor seropositive for at least four H. pylori proteins. Those who were serodiscordant for H. pylori and CagA, that is, seropositive for CagA but not for at least four H. pylori proteins, or seropositive for ≥4 H. pylori proteins but not CagA, were treated as missing and excluded from this sensitivity analysis set. Applying these exclusion criteria left 1,325 observations in the sensitivity analysis set (93% of observations used in the main analysis set). In this reduced data set, Lasso selected pepsinogen I in addition to the six variables chosen in the main analysis (Table 3). The AUC of the model including pepsinogen I was 76.56% (95% CI, 73.14%–79.98%). This was significantly greater than that of the model without pepsinogen I in this sensitivity analysis data set, whose AUC was 70.60% (95% CI, 66.67%–74.52%; P < 0.001, Fig. 2). It was also significantly greater than the AUC of the ABC Method, which was 66.84% (95% CI, 63.01%–70.67%; P < 0.001, Fig. 2). The sensitivity of the model with pepsinogen I was 75.42% (95% CI, 70.52%–80.32%) and its specificity was 62.76% (95% CI, 58.46%–67.05%). In contrast, the sensitivity of the model without pepsinogen I was 67.68% (95% CI, 62.37%–72.99%) and its specificity was 59.05% (95% CI, 54.68%–63.42%). The sensitivity of the ABC Method in this data set was 56.90% (95% CI, 51.28%–62.53%) and its specificity was 79.63% (95% CI, 76.04%–83.22%). Through LOOCV, the model with pepsinogen I had a predictive AUC of 76.56% (95% CI, 73.14%–79.99%), a sensitivity of 75.76% (95% CI, 70.89%–80.64%), and a specificity of 62.35% (95% CI, 58.03%–66.66%).
Discussion
Using data from a large East Asian nested case–control study, we built a predictive model of 10-year gastric cancer risk that incorporated detailed serum H. pylori protein and pepsinogen data as well as demographic and lifestyle risk factors. The new model had a greater classification AUC than the currently used ABC Method. In addition, LOOCV gave evidence that the Lasso model retained its predictive capability in independent data. Its discrimination metrics remained relatively stable, whereas the ABC Method's AUC was considerably lower than it was within the training data set. The ABC Method did have greater specificity than the Lasso model at the threshold of risk score >1.0. However, the importance of identifying high-risk individuals for early detection of gastric cancer makes sensitivity a more important measure of predictive models' value at this early stage of screening. In a similar vein, it is noteworthy that the Lasso model had a higher NPV than the ABC Method—meaning the new model resulted in fewer false negatives (i.e., people with gastric cancer who would be classified as disease free: an immensely dangerous mistake). The Lasso model also had a greater sensitivity and NPV than the ABC Method, as well as comparable PPV, when both models were examined at 75% specificity. Furthermore, within strata of study cohort—which are more accurate approximations of real-world populations than the international HpBCC nested case–control study—the Lasso model exhibited a higher AUC than the ABC Method in addition to higher sensitivity and NPV when both models were compared at 75% specificity.
Variables included in the Lasso model
The Lasso-determined model included several H. pylori proteins that have been associated with greatly increased risk of gastric cancer in previous studies conducted in East Asia. HP 0305 and HP 1564 were both selected by Lasso as binary variables. These two proteins together have been associated with greater risk of gastric cancer in China (52). HP 0305 and HP 1564 may augment gastric cancer risk by increasing H. pylori-mediated inflammation through promoting key bacteria–gastric cell interactions that enable the delivery of oncogenic microbial cargo to vulnerable cells (53). In addition, HP 1564 has been found to translocate the oncoprotein CagA into gastric epithelial cells (53). UreA (urease alpha subunit), which is associated with H. pylori's ability to neutralize gastric acid to facilitate life in the stomach, was also selected by Lasso (54). It is a curious choice, because UreA has not been found to have a strong association with gastric cancer in the past. It could be that UreA is strongly correlated with the presence of other H. pylori proteins that increase gastric cancer risk, such as HP 0305 and HP 1564. In a sensitivity analysis which excluded the continuous UreA variable from the Lasso model, the model without UreA exhibited a similar AUC to the model with UreA. Of note, our model did not include the CagA protein, which had the strongest univariate association with gastric cancer in our data set and has previously been associated with high risk of gastric cancer (17). This may be because CagA seropositivity is extremely common among individuals in East Asia living with H. pylori (8). In this data set, 87% of cases and 72% of controls were seropositive for CagA. VacA, another H. pylori protein that has been strongly associated with gastric cancer (including in this data set), was also not chosen by Lasso (23, 28, 55).
Chronic atrophic gastritis was also selected as a predictor of gastric cancer in our model, likely because it represents a further step along the path from H. pylori infection to gastric adenocarcinoma (8, 56). CAG in the absence of H. pylori infection has been found to have an even stronger positive association with gastric cancer than when infection is present, possibly because it implies that the atrophy has become so severe that it has rendered the stomach inhospitable for the bacterium (6, 7).
In a sensitivity analysis in which we excluded participants who were seronegative for H. pylori and seropositive for CAG or who were serodiscordant for H. pylori and CagA, Lasso selected pepsinogen I as a linear predictor in addition to the other six predictors. Including pepsinogen I substantially (and significantly) improved the predictive capability of the Lasso model when H. pylori seropositivity was defined this way. Pepsinogen I production only occurs in the acid-secreting glands of the gastric corpus; hence, as gastric atrophy spreads (usually from the pylorus, where H. pylori tends to colonize), the amount of pepsinogen I measurable in serum decreases (15, 57, 58). This is in line with our model finding an inverse association between serum pepsinogen I concentration and gastric cancer risk. However, absolute serum pepsinogen I has also been shown to increase due to mucosal inflammation caused by chronic H. pylori infection, which contradicts the association described previously (58, 59). The fact that both pepsinogen I and CAG were selected by Lasso suggests that a lower pepsinogen I value within strata of serologically defined CAG may also be informative of gastric cancer risk.
Our Lasso-generated model components differed somewhat from a previous predictive model of 10-year gastric cancer risk designed using data from the JPHC II (49). Both models included age, gender, and CAG. However, the JPHC II model included a binary variable for H. pylori infection status, and did not assess different H. pylori proteins. In addition, they included smoking status as a binary variable (current vs. former/never smokers). That model also included family history of gastric cancer, which was not selected by Lasso in our data set, and fish roe/fish gut consumption as a proxy measure for high salt intake. Our data set did not have a measure for salt consumption. The JPHC II model had a somewhat higher Harrell's c-index (equivalent to AUC for a ROC curve) than our model: 76.8% versus our model's predictive AUC of 73.37% (95% CI, 70.42%–76.32%; ref. 49). However, that model's sensitivity (69.7%) was lower than ours (73.56% at 60.38% specificity). This may be because they used a different criterion to choose a threshold of sensitivity and specificity: 1.9% predicted probability of gastric cancer versus our criterion of a risk score >1.0. Comparison is also difficult because they did not report confidence intervals for their estimates. Upon external validation, the JPHC II model showed a higher c-index of 79.8% (95% CI, 72.5%–86.1%) and a sensitivity of 74% (60). However, the precision of that model's estimates is generally lower than ours, likely owing to the relatively small number of gastric cancer cases in their analysis sets (412 in the derivation sample and 33 in the validation sample, vs. 708 in our study sample).
Limitations and strengths
We must acknowledge that measuring some of the variables included in the new Lasso model will be considerably more challenging than for the ABC Method. Currently, HP 0305, HP 1564, and UreA assays are only available in research laboratories. Technology transfer from such institutions to clinical laboratories will be necessary before any predictive model based on these H. pylori proteins could be implemented for widespread gastric cancer risk stratification. Commercial assays for pepsinogen are, however, readily available worldwide (61).
There were two potential sources of measurement error in this study. First, the multiplex serological test for seropositivity to H. pylori, defined as reactivity to ≥4 H. pylori proteins, had 89% sensitivity and 82% specificity, meaning that about 11% of individuals recorded as seropositive for H. pylori may not have actually had an infection, and 18% of seronegative individuals may actually have had an infection (28). This may have affected which variables were chosen by Lasso for inclusion in the predictive model. However, the gold standard to which these discrimination measures were compared was an ELISA. Because ELISA is a technically different approach to multiplex serology and is itself imperfect at indicating current H. pylori infection, sensitivity and specificity less than 100% are not necessarily cause for alarm (62). Second, although all gastric cancer cases included in this analysis were noncardia, we were not able to distinguish between histologic subtypes of gastric adenocarcinoma in this analysis. Diffuse-type adenocarcinoma may have different H. pylori risk factors from intestinal-type. Seropositivity to CagA antibodies, for example, has been associated more strongly with diffuse-type than intestinal-type (63, 64). Despite this, the majority of H. pylori proteins have been previously associated with both histologic subtypes, which suggests that the findings of the Lasso model will be relevant to both (55). In addition, the data included in this analysis were several decades old: the NIT blood samples were collected in 1985, and the JPHC I and II samples were collected in 1991 to 1993. The distribution of gastric cancer risk factors has surely changed in these populations since then, which may decrease these findings' relevance to the present day. Finally, we chose to code the variables in the predictive model in simple linear or binary fashion despite the curvilinear univariate associations that some of them (including age and UreA) had with gastric cancer. Although this may not reflect the association between each of these variables and gastric cancer in the data set with high accuracy, we believe that coding the variables more simply reduces the likelihood that the model is overfit to the data in which it was constructed. Moreover, linear and binary coefficients are easier to interpret than more complex terms.
To our knowledge, this is the first study to use machine learning techniques to build a predictive model for gastric cancer using detailed H. pylori protein data. Our results suggest that adding host immune response to different H. pylori proteins to the model, rather than categorizing individuals simply by H. pylori infection status at large, improves the prediction of gastric cancer risk. Moreover, the large sample size of this data set, and large number of cases due to the nested case–control design, improved statistical precision and enabled us to explore stratified analyses of the model's performance. Blood samples were collected at baseline, before any study participants developed gastric cancer, thus assuring that exposure preceded onset of disease.
Conclusion
Using machine learning techniques, we constructed a new predictive model for gastric cancer risk that incorporated host antibody response to the H. pylori proteins HP 0305, HP 1564, and UreA, serologically defined chronic atrophic gastritis, pepsinogen I, age, and gender. The new model exhibited improved AUC and sensitivity over an existing risk stratification model, the ABC Method. Improved noninvasive gastric cancer risk stratification may streamline the allocation of more invasive screening modalities, such as radiography or endoscopy, to high-risk individuals who will benefit from them most and away from low-risk individuals who do not need them. This may not only save individuals and the healthcare system unnecessary expense, but could also promote patients' peace-of-mind by encouraging high-risk individuals to seek further screening and reassuring low-risk individuals that they probably do not need such services. All in all, improved risk stratification could increase survival from this deadly disease in East Asia.
Authors' Disclosures
J.D. Murphy reports grants from NCI during the conduct of the study. J. Butt reports grants from Luminex Corporation outside the submitted work. M. Epplein reports grants from NIH during the conduct of the study. No disclosures were reported by the other authors.
Authors' Contributions
J.D. Murphy: Conceptualization, data curation, software, formal analysis, validation, visualization, methodology, writing–original draft, project administration, writing–review and editing. A.F. Olshan: Conceptualization, supervision, project administration, writing–review and editing. F.-C. Lin: Conceptualization, software, supervision, validation, methodology, writing–review and editing. M.A. Troester: Conceptualization, supervision, methodology, writing–review and editing. H.B. Nichols: Supervision, writing–review and editing. J. Butt: Conceptualization, data curation, software, investigation, methodology, writing–review and editing. Y.-L. Qiao: Resources, data curation, investigation, writing–review and editing. C.C. Abnet: Resources, data curation, investigation, writing–review and editing. M. Inoue: Resources, data curation, investigation, writing–review and editing. S. Tsugane: Resources, data curation, investigation, writing–review and editing. M. Epplein: Conceptualization, resources, supervision, funding acquisition, methodology, project administration, writing–review and editing.
Acknowledgments
John D. Murphy was supported by grant T32 CA057726/CA/NCI NIH HHS/United States.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.