Abstract
Risk prediction models for gastric cancer could identify high-risk individuals in the general population. The objective of this study was to systematically review the available evidence about the construction and verification of gastric cancer predictive models. We searched PubMed, Embase, and Cochrane Library databases for articles that developed or validated gastric cancer risk prediction models up to November 2021. Data extracted included study characteristics, predictor selection, missing data, and evaluation metrics. Risk of bias (ROB) was assessed using the Prediction model Risk Of Bias Assessment Tool (PROBAST). We identified a total of 12 original risk prediction models that fulfilled the criteria for analysis. The area under the receiver operating characteristic curve (AUC) ranged from 0.73 to 0.93 in derivation sets (n = 6), 0.68 to 0.90 in internal validation sets (n = 5), 0.71 to 0.92 in external validation sets (n = 7). The higher-performing models usually include age, salt preference, Helicobacter pylori, smoking, body mass index, family history, pepsinogen, and sex. According to PROBAST, at least one domain with a high ROB was present in all studies mainly due to methodologic limitations in the analysis domain. In conclusion, although some risk prediction models including similar predictors have displayed sufficient discriminative abilities, many have a high ROB due to methodologic limitations and are not externally validated efficiently. Future prediction models should adherence to well-established standards and guidelines to benefit gastric cancer screening.
Through systematical reviewing available evidence about the construction and verification of gastric cancer predictive models, we found that most models have a high ROB due to methodologic limitations and are not externally validated efficiently. Future prediction models are supposed to adherence to well-established standards and guidelines to benefit gastric cancer screening.
Introduction
According to the latest report of Global Cancer Statistics 2020 (GLOBOCAN 2020), gastric cancer is the fifth most commonly diagnosed and the fourth most life-threatening cancer (1). The endoscopic examination with biopsies is the gold standard for the diagnosis of gastric cancer and has been universally accepted and recommended for routine screening (2, 3). However, the disadvantages of high cost, invasive examination, and technically demanding extremely restrict the large-scale application, particularly in countries with low incidence rates or limited medical resources. Thus, there is an urgent need to develop efficient methods that enable the early identification of the high‐risk groups for subsequent endoscopy.
Risk prediction models, which are designed to estimate future risk based on current and past information (4), can be used for risk stratification in screening programs at the population level. The development of gastric cancer is a protracted process influenced by genetic and environmental factors and accompanied by changes in serologic biomarkers, such as Helicobacter pylori (HP) antibody (5), pepsinogen I (PGI), and II (PGII) (6). This provides a basis for the construction of gastric cancer risk prediction models.
An ideal prediction model should have good performance in both discrimination and generalization. Previous studies targeting other diseases such as oropharyngeal cancer (7), esophageal cancer (8), and breast cancer (9) have suggested that the high risk of bias due to inappropriate predictors or methodology could restrict the applicability and predictive power of prediction models. Therefore, assessment of the risk of bias (ROB) and applicability to the intended population and setting for gastric cancer risk prediction models are highly warranted. Although recent years have seen an increasing number of risk prediction models for gastric cancer, an attempt to comprehensively evaluate model development and validation has not been performed.
In this study, we sought to systematically review the construction and validation of multivariable risk prediction models, that were used to predict the risk of developing gastric cancer in the general population. The study aimed to identify the most common predictors incorporated into prediction models and evaluate study characteristics associated with model performance to guide future model development.
Materials and Methods
The present systematic review was carried out following Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines (10).
Search strategy
We exhaustively searched PubMed (https://www.ncbi.nlm.nih.gov/pubmed), Embase (https://www.embase.com/), and Cochrane Library databases (https://www.cochranelibrary.com/) for multivariable risk prediction models of gastric cancer. The articles were searched up to November 2021 and only those published in English were included. The search terms were referred to previous studies (11, 12) and detailed retrieval strategies were provided in the Supplementary Appendix (Supplementary Table S1-S3). Two independent reviewers (Jianhua Gu and Ru Chen) examined the titles and abstracts of the initial screening articles according to the following selection criteria: (i) articles related to gastric cancer prevention or prediction; (ii) a new algorithm or risk prediction model was developed in the general population; (iii) exclude animal studies, reviews and meta-analysis. In addition, we also checked the reference lists of eligible articles to ensure that important articles were not missed.
Data collection
The following information was extracted from the included articles: the first author, publication year, study design, study population (country, ethnicity, age and sex of participants, sample size), number of events, model-related information (statistical methods, model performance, modeling building strategies, validation method, and predictors in final analysis). According to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines (13), we classified the methods of eligible articles published for each risk prediction model. If multiple models were established by age and sex in a given study, all were included separately.
Quality assessment
We used the Prediction model Risk Of Bias ASsessment Tool (PROBAST) to assess the ROB and applicability of the included studies. The PROBAST involves the assessment of four domains: participants, predictors, outcome, and analysis. These domains further contain 20 signaling questions to facilitate structured judgment of risk of bias (4).
Data availability statement
Data sharing is not applicable to this article as no data were created or analyzed in this study.
Results
Of 774 total citations (205 in PubMed, 381 in Embase, 188 in Cochrane Library databases), 12 full-text articles met the criteria and were included in the final analysis (Fig. 1; refs. 14–25). Among this, one developed two sex-stratified models (17), one was a validation study (15, 16), thus a total of 12 models were identified on risk prediction for gastric cancer.
Flowchart of literature-screening process. This figure displays the flow of the literature selection applying the systematic search and selection strategies to identify eligible studies.
Flowchart of literature-screening process. This figure displays the flow of the literature selection applying the systematic search and selection strategies to identify eligible studies.
Description of studies
The primary characteristics of included studies were summarized in Table 1. Eleven studies originated from East Asia [five from Japan (15, 16, 18, 19, 21), three from China (14, 23, 25), two from Korea (17, 22), one from Singapore (24)], while only one study was from America (20). Among this, six were cohort studies (including a nested case–control study; refs. 15–19, 24), four were case–control studies (20–23), and the remaining were cross-sectional studies (14, 25). The sample sizes ranged from 140 to 2,176,501. Among 12 models, six were diagnostic prediction models using Logistic regression (14, 20, 22, 23, 25) and machine learning (21), and six were prognostic prediction models using Cox regression (15–19, 24).
Summary of included studies.
Authors (year) . | Country . | Study design . | Study period . | Method . | Study typea . | Samples . | Cases . | Age . | Male (%) . | c-statistics . |
---|---|---|---|---|---|---|---|---|---|---|
Cai Q (2019) | China | Cross-sectional study | 2015–2017 | Logistic | 2a | 9,383(D) 5,091(EV) | 267(D) 138(EV) | 56.2 ± 9.6 | 46.94 | 0.76(D) 0.76(IV) 0.73(EV) |
Charvat H (2016) | Japan | Cohort study | 1993–2009 | COX | 1b | 19,028 | 412 | 59.3(51.5–64.9) (cases) 59.3(57.9–67.1) (non-cases) | 61.9(cases) 35.7(non-cases) | 0.77(IV) |
Charvat H (2020) | Japan | Cohort study | 1990–1993 | COX | 4 | 1,292 | 33 | 56.52±5.78 | 34.1% | 0.80(EV) |
Eom BW (2015) | Korea | Cohort study | 1996–2007(D) | COX (men) | 3 | 1,372,424(D) 484,335(EV) | 19,465(D) 6,628(EV) | 45.1±10.5(men)(D) 46.8±12.8(men)(EV) | 77.8(D) 51.0(EV) | 0.78(EV) |
1998–2007(EV) | COX (women) | 3 | 804,077(D) 466,013(EV) | 5,579(D) 2,920(EV) | 48.7±11.0(women)(D) 51.1±12.1(women)(EV) | 0.71 (EV) | ||||
Iida M (2018) | Japan | Cohort study | 1988–2002(D) 2002–2007 (EV) | COX | 3 | 2,444(D) 3,204(EV) | 90(D) 35(EV) | 58±11(D) 62±13(EV) | 41.6(D) 42.1(EV) | 0.79(D) 0.76(EV) |
Ikeda F (2016) | Japan | Cohort study | 1988–2008 | COX | 1a | 2,446 | 123 | 58.3 ± 11.4 | 41.54 | 0.77(D) |
In H (2020) | America | Case–control | NR | Logistic | 1a | 140 | 40 | 40–79 | 24(cases) 50(non-cases) | 0.94(D) |
Lee DS (2009) | Korea | Case–control | 2005.3–8 | Logistic | 1b | 382 | 183 | >35 | 65(cases) 47.7(non-cases) | 0.90(IV) |
Qiu L (2020) | Chinese | Case–control | 2009–2011 | Logistic | 1b | 2,287 | 1,115 | ≤59: 569 (51%) (cases) ≤59: 593 (51%) (non-cases) | 71.1(cases) 70.1(non-cases) | 0.684(IV) |
So JBY (2021) | Singapore and Korea | Nested case–control study | NR | Cox | 3 | 472(D) 210(EV) | 236(D) 94(EV) | 61.2±8.4(D)(cases) 68.0±10.9(D)(non-cases) | 63.3(D)(cases) 62.7(D)(cases) | 0.93(D) 0.92(EV) 0.85(0.81–0.88) (EV) |
Taninaga J (2019) | Japan | Case–control study | 2006–2017 | machine learning | 2a | 1,144(D) 287(EV) | 89 | 56.7±8.8(cases) 46.2±1.0(non-cases) | 78.06 | 0.87(IV) 0.90(EV) |
Tu H (2017) | China | Cross-sectional study | 1997–2012 | Logistic | 1a | 9,002 | 94 | 61.2±11.4(cases) 50.7±10.1(non-cases) | 72.3(cases) 47(non-cases) | 0.80(D) |
Authors (year) . | Country . | Study design . | Study period . | Method . | Study typea . | Samples . | Cases . | Age . | Male (%) . | c-statistics . |
---|---|---|---|---|---|---|---|---|---|---|
Cai Q (2019) | China | Cross-sectional study | 2015–2017 | Logistic | 2a | 9,383(D) 5,091(EV) | 267(D) 138(EV) | 56.2 ± 9.6 | 46.94 | 0.76(D) 0.76(IV) 0.73(EV) |
Charvat H (2016) | Japan | Cohort study | 1993–2009 | COX | 1b | 19,028 | 412 | 59.3(51.5–64.9) (cases) 59.3(57.9–67.1) (non-cases) | 61.9(cases) 35.7(non-cases) | 0.77(IV) |
Charvat H (2020) | Japan | Cohort study | 1990–1993 | COX | 4 | 1,292 | 33 | 56.52±5.78 | 34.1% | 0.80(EV) |
Eom BW (2015) | Korea | Cohort study | 1996–2007(D) | COX (men) | 3 | 1,372,424(D) 484,335(EV) | 19,465(D) 6,628(EV) | 45.1±10.5(men)(D) 46.8±12.8(men)(EV) | 77.8(D) 51.0(EV) | 0.78(EV) |
1998–2007(EV) | COX (women) | 3 | 804,077(D) 466,013(EV) | 5,579(D) 2,920(EV) | 48.7±11.0(women)(D) 51.1±12.1(women)(EV) | 0.71 (EV) | ||||
Iida M (2018) | Japan | Cohort study | 1988–2002(D) 2002–2007 (EV) | COX | 3 | 2,444(D) 3,204(EV) | 90(D) 35(EV) | 58±11(D) 62±13(EV) | 41.6(D) 42.1(EV) | 0.79(D) 0.76(EV) |
Ikeda F (2016) | Japan | Cohort study | 1988–2008 | COX | 1a | 2,446 | 123 | 58.3 ± 11.4 | 41.54 | 0.77(D) |
In H (2020) | America | Case–control | NR | Logistic | 1a | 140 | 40 | 40–79 | 24(cases) 50(non-cases) | 0.94(D) |
Lee DS (2009) | Korea | Case–control | 2005.3–8 | Logistic | 1b | 382 | 183 | >35 | 65(cases) 47.7(non-cases) | 0.90(IV) |
Qiu L (2020) | Chinese | Case–control | 2009–2011 | Logistic | 1b | 2,287 | 1,115 | ≤59: 569 (51%) (cases) ≤59: 593 (51%) (non-cases) | 71.1(cases) 70.1(non-cases) | 0.684(IV) |
So JBY (2021) | Singapore and Korea | Nested case–control study | NR | Cox | 3 | 472(D) 210(EV) | 236(D) 94(EV) | 61.2±8.4(D)(cases) 68.0±10.9(D)(non-cases) | 63.3(D)(cases) 62.7(D)(cases) | 0.93(D) 0.92(EV) 0.85(0.81–0.88) (EV) |
Taninaga J (2019) | Japan | Case–control study | 2006–2017 | machine learning | 2a | 1,144(D) 287(EV) | 89 | 56.7±8.8(cases) 46.2±1.0(non-cases) | 78.06 | 0.87(IV) 0.90(EV) |
Tu H (2017) | China | Cross-sectional study | 1997–2012 | Logistic | 1a | 9,002 | 94 | 61.2±11.4(cases) 50.7±10.1(non-cases) | 72.3(cases) 47(non-cases) | 0.80(D) |
Note: 1a, development only; 1b, development and validation using resampling; 2a, random split-sample development and validation; 2b, nonrandom split-sample development and validation; 3, development and validation using separate data; 4, validation study.
Abbreviations: D, derivation; EV, external validation; IV, internal validation; NR, Not reported.
aType of study according to the TRIPOD guidelines.
Model characteristics
The number of events per predictor variable (EPV) was defined as the number of gastric cancer events divided by the number of predictor variables. Less than half of the models (41.7%) had EPVs greater than 20 (14, 15, 17, 23), six were 10–20 (18, 19, 21, 22, 24, 25), and only one model had EPVs less than 10 (20). Except for one model that did not state how to handle continuous variables (21), the continuous variables were only retained in two models (16.7%; refs. 23, 24) and converted to a categorical form in nine models (75.0%; refs. 14–20, 22, 25). Six models reported the handling of missing values in the derivation dataset (14, 17, 20, 22, 23), while the missing values were not discussed in the rest. Among these, three models removed records containing missing values (14, 20, 23), and three models imputed missing data (17, 23). Seven models presented complete information, of which four used a complete regression equation (15, 17, 22) and three transformed into a risk scoring system (Table 2; refs. 14, 18, 25).
Summary of model characteristics predicting gastric cancer.
Characteristics of analysis . | Number of models (%) . |
---|---|
EPV in final model | |
<10 | 1 (8.3) |
10–20 | 5 (41.7) |
≥21 | 6 (50) |
Handling of continuous variables | |
Transforming to multi-categorical variables | 9 (75.0) |
Keep continuous variables | 2 (16.7) |
NR | 1 (8.3) |
Variable selection | |
Clinical experience and statistical analysis | 9 (75.0) |
NR | 3 (25.0) |
Missing values | |
Replaced using imputation | 3 (25) |
Complete case analysis | 2 (16.7) |
NR | 7 (58.3) |
Model performance | |
Discrimination | 12 (100.0) |
Calibration (Hosmer–Lemeshow test) | 6 (50.0) |
Validation | |
Internal | 2 (16.7) |
External | 4 (33.3) |
Internal and external | 3 (25.0) |
NR | 3 (25.0) |
Internal validation | |
Cross-validation | 1 (8.3) |
Bootstrap | 4 (33.3) |
Model presentation | |
Using complete regression equation | 4 (33.3) |
Transformed into risk scoring system | 3 (25.0) |
Complete model not reported | 5 (41.7) |
Characteristics of analysis . | Number of models (%) . |
---|---|
EPV in final model | |
<10 | 1 (8.3) |
10–20 | 5 (41.7) |
≥21 | 6 (50) |
Handling of continuous variables | |
Transforming to multi-categorical variables | 9 (75.0) |
Keep continuous variables | 2 (16.7) |
NR | 1 (8.3) |
Variable selection | |
Clinical experience and statistical analysis | 9 (75.0) |
NR | 3 (25.0) |
Missing values | |
Replaced using imputation | 3 (25) |
Complete case analysis | 2 (16.7) |
NR | 7 (58.3) |
Model performance | |
Discrimination | 12 (100.0) |
Calibration (Hosmer–Lemeshow test) | 6 (50.0) |
Validation | |
Internal | 2 (16.7) |
External | 4 (33.3) |
Internal and external | 3 (25.0) |
NR | 3 (25.0) |
Internal validation | |
Cross-validation | 1 (8.3) |
Bootstrap | 4 (33.3) |
Model presentation | |
Using complete regression equation | 4 (33.3) |
Transformed into risk scoring system | 3 (25.0) |
Complete model not reported | 5 (41.7) |
Abbreviations: EPV, number of events per predictor variable; NR, not reported.
Model development and validation
All models evaluated the discriminative ability by derivation c-statistics, and half of the models assessed calibration ability with the Hosmer–Lemeshow test (14, 15, 17, 18, 22). As shown in Fig. 2, the AUCs ranged from 0.73 to 0.93 in the derivation sets (n = 6), 0.68 to 0.90 in the internal validation sets (n = 5), 0.71 to 0.92 in the external validation sets (n = 7). Five of the seven external validation models were developed and validated using separate data (15, 17, 18, 24) while the remaining two randomly divided the whole dataset into a training set and test set (14, 21). Four models performed internal validation by bootstrapping (14, 15, 22, 23) and one model performed internal validation by cross-validation (21). Only one model developed in a Chinese cross-sectional study reported the AUC in both the derivation cohort (AUC = 0.76), internal validation cohort (AUC = 0.76) and external validation cohort (AUC = 0.73; ref. 14).
The AUC of risk prediction models in the derivation cohort, internal validation cohort, and external validation cohort. Note: Charvat H (2020) was a validation study that can be seen as complementary to Charvat H (2016). Hence, we merged the two studies to analyze them. Eom BW (2015) developed two sex-stratified models, so we analyzed two models separately. AUC, area under the receiver operating characteristic curve.
The AUC of risk prediction models in the derivation cohort, internal validation cohort, and external validation cohort. Note: Charvat H (2020) was a validation study that can be seen as complementary to Charvat H (2016). Hence, we merged the two studies to analyze them. Eom BW (2015) developed two sex-stratified models, so we analyzed two models separately. AUC, area under the receiver operating characteristic curve.
Predictors in the models
Across all models, there were 30 distinct predictors, which could be further divided into four categories: demographic, clinical, lifestyle, and laboratory variables (Fig. 3). Thirteen predictors were presented two or more times, including three demographic variables [age, sex, body mass index (BMI)], five lifestyle variables (salt preference, smoking, alcohol, physical activity, meal regularity), four laboratory variables (HP, PG, hemoglobin A1c level, gastrin-17), and one clinical variable (family history). The details of predictors for each study across the above categories are presented in Supplementary Table S4. In terms of individual variables, age was the most frequently selected variable which appeared nine times. The second-ranking variables, a high-salt diet and HP, appeared six times. Aside from this, the usage frequency did not exceed 50% for any other predictors. The type of variables was strongly dependent on the purpose and design of the study.
Classification of predictors identified in risk prediction models for gastric cancer. Note: Charvat H (2020) was a validation study that can be seen as complementary to Charvat H (2016). Hence, we merged the two studies to analyze them. Eom BW (2015) developed two sex-stratified models, so we analyzed two models separately.
Classification of predictors identified in risk prediction models for gastric cancer. Note: Charvat H (2020) was a validation study that can be seen as complementary to Charvat H (2016). Hence, we merged the two studies to analyze them. Eom BW (2015) developed two sex-stratified models, so we analyzed two models separately.
Study quality and ROB
Table 3 and Fig. 4 summarize the ROB for all included studies. Overall, there were two modes at a high ROB for participants due to the unreasonable inclusion criteria (21, 22), one at a high ROB for predictors (22), and one at a high ROB for outcomes (22). In the domain of analysis concerns, all responses were “high risk” because the processing of missing values or overfitting was unsatisfactory in most studies. The details of the ROB for each study across the above domains are presented in Supplementary Fig. S1. Two models were assessed as high concern regarding applicability because one included hospital-based cases in a case–control study (22) and the other did not have a clear pathologic diagnosis for some a small selection of patients who underwent endoscopic examination (21).
Risk of bias and applicability of included studies using PROBAST.
. | ROB . | Applicability . | Overall . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | Participants . | Predictors . | Outcome . | Analysis . | Participants . | Predictors . | Outcome . | Risk of bias . | Applicability . |
Cai Q (2019) | + | + | + | − | + | + | + | − | + |
Charvat H (2016) | + | + | + | − | + | + | + | − | + |
Charvat H (2020)a | |||||||||
Eom BW (2015) | + | + | + | − | + | + | + | − | + |
Iida M (2018) | + | + | + | − | + | + | + | − | − |
Ikeda F (2016) | + | + | + | − | + | + | + | − | + |
In H (2020) | + | + | + | − | + | + | + | − | + |
Lee DS (2009) | − | − | − | − | − | + | + | − | − |
Qiu L (2020) | + | + | + | − | + | + | + | − | + |
So JBY (2021) | + | + | + | − | + | + | + | − | + |
Taninaga J (2019) | − | + | + | − | − | + | + | − | − |
Tu H (2017) | + | + | + | − | + | + | + | − | + |
. | ROB . | Applicability . | Overall . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | Participants . | Predictors . | Outcome . | Analysis . | Participants . | Predictors . | Outcome . | Risk of bias . | Applicability . |
Cai Q (2019) | + | + | + | − | + | + | + | − | + |
Charvat H (2016) | + | + | + | − | + | + | + | − | + |
Charvat H (2020)a | |||||||||
Eom BW (2015) | + | + | + | − | + | + | + | − | + |
Iida M (2018) | + | + | + | − | + | + | + | − | − |
Ikeda F (2016) | + | + | + | − | + | + | + | − | + |
In H (2020) | + | + | + | − | + | + | + | − | + |
Lee DS (2009) | − | − | − | − | − | + | + | − | − |
Qiu L (2020) | + | + | + | − | + | + | + | − | + |
So JBY (2021) | + | + | + | − | + | + | + | − | + |
Taninaga J (2019) | − | + | + | − | − | + | + | − | − |
Tu H (2017) | + | + | + | − | + | + | + | − | + |
Abbreviation: ROB, risk of bias.
aCharvat H (2020) was a validation study that can be seen as complementary to Charvat H (2016). Hence, we merged the two studies to analyze them. + indicates low ROB/low concern regarding applicability; − indicates high ROB/high concern regarding applicability.
A, Risk of bias assessment according to the PROBAST. B, Applicability assessment according to the PROBAST.
A, Risk of bias assessment according to the PROBAST. B, Applicability assessment according to the PROBAST.
Discussion
In this systematic review, we synthesized and evaluated the prediction performance, predictors, and ROB in 12 risk prediction models of gastric cancer developed at the population level. Although most risk prediction models presented a high discriminative ability, less than half of the models performed external validation in diverse datasets. The risk prediction models with high discriminative ability tended to include age, salt preference, HP, smoking, BMI, family history, pepsinogen, and sex. At least one domain with a high ROB was present in all studies mainly due to methodologic limitations in the analysis domain, which resulted in an overall high ROB for each model assessed by PROBAST.
The models warranting promotion in clinical practice or primary care should have good discriminative and generalization capability. When the same dataset is used for both development and validation, performance is usually overestimated due to overfitting issues. Therefore, the generalization capability should be evaluated with a testing dataset independent of model construction (26). In addition, internal validation, which provides a more objective discriminative capacity and stability is also the crucial step for model evaluation (4). In our review, only five of the seven external validation models were developed and validated using separate data while the remaining two randomly divided the whole dataset into a training set and testing set. Although splitting the dataset is a reliable approach for external validation, a better approach would be to use one or more independent datasets. In addition, only few studies performed internal validation, and fewer studies considered both internal and external validation. Similar concerns were identified from other modeling studies performed on breast cancer (9) or lung cancer (27). In addition, only a limited number of studies provided complete risk equations, whereas the remaining studies were unlikely to be independently externally validated since complete risk equations are required for risk assessment. Therefore, further external validations of existing models using independent datasets are required to guarantee their generalizability.
A suitable prediction model should consist of plausible variables that have been epidemiologically and clinically associated with gastric cancer. There is substantial evidence that the incidence of gastric cancer in the general population increases with age and is higher in males, especially for those over 40 years old (28, 29). HP infection is the strongest known risk factor for the development of gastric cancer (30), and a high-salt diet (31), familial history of gastric cancer (32), and smoking (28) are also associated with an increased risk of gastric cancer. The above factors, which provide a theoretical basis for model construction, are the most common variables in gastric cancer models. In addition, the type of variables was strongly dependent on the purpose and design of the study. Except for three exploratory studies that used single-type variables such as genotype variables or miRNA, the other models strive to incorporate multivariate factors to explore high-risk groups. The models developed by In and colleagues (20) and Lee and colleagues (22) had the highest prediction performance in derivation and internal validation, respectively. However, their studies were not subjected to external validation, and the predictors were dominated by lifestyle and demographic factors only assessed by questionnaires, and a lack of specific diagnostic indicators such as HP infection or serum pepsinogen. Atrophic gastritis caused by HP could affect the secretion of PGs and gastrin, which serve as promising biomarkers for predicting the status of the gastric mucosa (33). Asian countries with a high incidence of gastric cancer, such as China and Japan, have implemented screening programs using a combination of the presence of HP antibody and serum PG concentration (34, 35). Although most of these serological markers are relatively difficult to obtain in primary and community care settings, they account for higher specificity and result in improved generalization performances (25).
Our risk of bias assessment showed that all 12 gastric cancer models scored a high ROB in the analysis domain of the PROBAST tool because poor methods were often employed and many were inadequately reported. In addition to the limitations of the absence of effective external verification, another issue worthy of mention is the handling of missing values. Six models reported the handling of missing values in the derivation dataset. Of these, three used multiple imputations (17, 23) and three removed missing values (14, 20, 23). Deletion of outliers decreases the discriminatory power of multivariable models compared with values replaced with multiple imputations also leads to a biased effect estimation, making the sample less representative (36). Furthermore, the EPV has been studied in relation to model performance, and a model with an EPV higher than 20 is less likely to have overfitting (37). Our review showed that only six of 12 models met the criteria, implying that the remaining models were likely to have been overfitted. For predictors with low prevalence presented in a model, such as occupational exposures and lymphocyte ratio, a higher EPV is needed to eliminate bias in regression coefficients and improve predictive accuracy.
Strengths and limitations
The main strength of this review was that it provided a comprehensive mapping of the available studies on risk prediction models for gastric cancer in the general population. We used robust methodology to synthesize the evidence by following the TRIPOD guidelines and thus provided a detailed description of the features of each gastric cancer model. In addition, the ROB of the included studies was assessed according to PROBAST, which is a new tool for assessing the methodologic quality of risk prediction models, to provide a better understanding of the overall quality of the current gastric cancer models.
Several limitations exist in the present systematic review. First, we were unable to perform subgroup analysis by anatomic site since almost all of the included studies did not provide typing information. Given that the etiology is different between gastric cardia and noncardia cancer (38), it would be inappropriate to predict a high-risk population of the two subgroups by the same model. Second, due to the lack of evaluation metrics, such as 95% confidence intervals, we did not perform a meta-analysis for prediction performance in the model development and validation. Third, we only included full-text articles published in English, which may have led to a potential publication bias.
Conclusion
In conclusion, although some risk prediction models including similar predictors have displayed sufficient discriminative abilities, many have a high ROB due to methodologic limitations and are not externally validated efficiently. Up-to-date guidelines, such as PROBAST and TRIPOD, have proposed a set of practical guidelines for the normalization process of model construction and validation. For a more reliable risk prediction model for gastric cancer based on the general population, future studies should take inspiration from previous studies and adhere to well-established methodologic standards and guidelines specifically developed for the prediction model.
Authors' Disclosures
No disclosures were reported.
Authors' Contributions
J. Gu: Writing–original draft, writing–review and editing. R. Chen: Funding acquisition, writing–original draft, writing–review and editing. S.-M. Wang: Writing–review and editing. M. Li: Data curation, investigation. Z. Fan: Investigation. X. Li: Supervision. J. Zhou: Investigation. K. Sun: Supervision, funding acquisition. W. Wei: Conceptualization, supervision.
Acknowledgments
We thank all authors for their contributions to this study. This study is supported by the Beijing Natural Science Foundation https://doi.org/10.13039/501100005089 (7204294, to R. Chen), National Natural Science Foundation of China https://doi.org/10.13039/501100001809 (81903403, to R. Chen, 81974493, to W. Wei), and the National Key R&D Program of China (2016YFC0901404, to W. Wei, 2018YFC1311704, to K. Sun).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.