Abstract
There is an urgent need for cost-effective, non-invasive tools to detect early stages of gastrointestinal cancer (colorectal, gastric, and esophageal cancers). Despite many publications suggesting circulating metabolites acting as accurate cancer biomarkers, few have reached the clinic. In upper gastrointestinal cancer this is critically important, as there is no test to complement gold-standard endoscopic evaluation in patients with mild symptoms that do not meet referral criteria. Therefore, this study aimed to describe and solve this translational gap. Studies reporting diagnostic accuracy of metabolomic blood-based gastrointestinal cancer biomarkers from 2007 to 2020 were systematically reviewed and progress of each biomarker along the discovery–validation–adoption pathway was mapped. Successful biomarker translation was defined as a composite endpoint, including patent protection/FDA approval/recommendation in national guidelines. The review found 77 biomarker panels of gastrointestinal cancer, including 25 with an AUROC >0.9. All but one was stalled at the discovery phase, 9.09% were patented and none were clinically approved, confirming the extent of biomarker translational gap. In addition, there were numerous “re-discoveries,” including histidine, discovered in 7 colorectal studies. Finally, this study quantitatively supports the presence of a translational gap between discovery and clinical adoption, despite clear evidence of highly performing biomarkers with significant potential clinical value.
Introduction
Gastrointestinal cancers (esophageal, gastric, and colorectal cancers) are among the most common cancer types worldwide and are associated with high mortality and morbidity (1). Clinical evaluation and computer models based on clinical features have limited predictive value for underlying gastrointestinal malignancy in curable early stages, where symptoms are either not present or non-specific (2). Endoscopy is the gold-standard approach for gastrointestinal cancer diagnosis, but is an invasive procedure with high costs, precluding its use as an unselected screening modality (3). In the UK, the fecal immunochemical test (FIT) is currently used for colorectal cancer screening, whereas there is no national screening program for esophago-gastric cancer (4). Thus, there is an evident need to develop cost-effective, non-invasive tools to detect early stages of gastrointestinal cancer, irrespective of symptomatology, at a primary care setting.
Serum metabolites can be measured with a simple blood test and have potential for point-of-care use in diagnostics, surveillance and treatment monitoring (5, 6). We have previously reviewed progress between 2007 and 2014 in developing gastrointestinal cancer blood tests based on metabolic biomarkers, measured using mass spectrometry (MS) and nuclear magnetic resonance spectroscopy (NMR; ref. 7). Despite the large number of publications on metabolic biomarkers, however, only one had been commercialized irrespective of their clinical utility (1 of 29 publications as reviewed by Antonowicz and colleagues, 2016) (7 and 8). This is consistent with Borrebaeck and colleagues (9), who found that only 50 cancer biomarkers have been FDA approved, compared with >10,000 discovered. This suggests a fundamental translational gap between biomarker discovery and implementation.
The route for biomarker translation has been described previously as a sequential pathway comprising of five stages—biomarker discovery, assay validation, clinical validation, clinical evaluation, and biomarker implementation (10). The exceptionally small numbers of clinically implemented metabolic biomarkers suggests significant impediments along this pathway (10–13). Historically, there has been an excessive number of biomarker discovery studies, with little progress toward implementation (7). Thus, there is a definite need to understand why >20 years of metabolic biomarker discovery work has not led to meaningful clinical impact.
The objective of this review is to describe the barriers to clinical implementation of circulating metabolic biomarkers in gastrointestinal cancer. The specific aims are to (i) evaluate the success of candidate biomarkers, in particular in certified tests, patents or guidelines, (ii) assess the progress of biomarkers along the validation pathway, (iii) provide a longitudinal analysis of methodological and reporting quality, and (iv) highlight the best available candidates and their current gaps to implementation, to inform a clearer path to impactful clinical tools.
Materials and Methods
Systematic search
First, our previous systematic review (2007 – 2014) was updated by identifying new studies published from 2015 onwards. Two authors (K.-V. Savva and B. Das) performed independent systematic literature searches as previously described (7), using the same inclusion and exclusion criteria. The following search terms were used: “esophagus” or “colon” or “colorectal” or “stomach” or “small intestine” or “large intestine” and “carcinoma” or “tumor” or “neoplasm” or “cancer” and “metabotyping” or “metabolic profiling” and “MS” or “magnetic resonance spectroscopy” or “NMR spectroscopy,” together with their respective medical subject headings (MeSH). The search was conducted in Medline, Web of Science, the Cochrane database, EMBASE and Pubmed. Titles, abstracts, and full texts were processed using PRISMA guidelines (Fig. 1; Supplementary Table S1). Studies were considered for inclusion if hematologic metabolites of patients with gastrointestinal cancer were identified with NMR/MS. Studies were excluded if they were written in a non-English language, included non-blood specimens, did not include a non-diseased cohort, considered an in vitro methodology, the timing of the index test was too far away from the reference standard or evaluated hepatobiliary and pancreatic cancers. Disagreement was discussed between K.-V. Savva, B. Das, S. Antonowicz, and C.J. Peters until unanimous agreement was achieved.
Data extraction
The following domains were extracted by K.-V. Savva and B. Das: (i) Institution(s) and dates of research, (ii) hypothesis, (iii) number of patients, controls, samples; type of cancer(s), (iv) metabolomic platform (global or targeted approach), (v) sample collection and preparation procedures, (vi) diagnostic metrics [sensitivity/specificity and/or area under receiver operating characteristic curve (AUROC)] of the principal biomarkers/biomarker composite, (vii) statistical analysis, (viii) results, including diagnostic accuracy, and (ix) conclusions and applicability.
Analysis of biomarker translation
Biomarker success can be defined by its eventual clinical implementation (10, 13). Thus, biomarker success in this study was defined as a composite endpoint comprising of national regulatory body approval for clinical use (e.g., FDA, NICE, and EMA) and/or successful patent protection. Patent status provides a surrogate indicator of biomarker success, as it is a prerequisite for biomarker clinical implementation. Public patent databases (Google Patents, Espacenet and Patentscope) were searched by author, institution, and biomarker to assess patent status. All identified biomarkers were separated into two groups: patented and non-patented biomarkers. Journal impact factors for all publications, based on year of publication, were retrieved from an online database (https://www.scijournal.org/). Cox multivariate analysis was used to assess the relationship between patent status and publication frequency, taking into consideration year of publication, funding, impact factor and research group name. The Mann–Whitney U test was used to identify significant differences between patent status and STARD and CAWG-MSI scores. All statistical analyses were conducted using SPSS version 25 (IBM).
Analysis of biomarker development
The evolution of biomarker research programs during the study period was followed using author details, publication dates, journal impact factor, patent status, and funding details using web searches against biomarker and author names. Composite diagnostic models based on several metabolites were considered as a single biomarker panel. All relevant articles were manually grouped by biomarker. The total number of biomarkers identified and total publication frequency for each biomarker were recorded. A frequency bar chart was constructed to identify the publication frequencies for all biomarkers. Time difference (years) was calculated between the first publication and each subsequent publication for each biomarker to allow a fair comparison between biomarkers published in different years.
The frequency that individual metabolites were identified as potential diagnostic biomarkers, either in multiple metabolite panels or in isolation, was recorded. Metabolites were only included whether the statistical analysis accounted for multiplicity when more than one metabolite was measured. In the absence of multiplicity correction, a Bonferroni correction was applied. If original P values were not reported, and could therefore not be corrected, stated metabolites were excluded.
Study quality
The QUADAS-2 tool was used to assess the risk of methodologic bias (14). The STARD checklist was used to assess the transparency of reporting (15). The CAWG-MSI checklist was also used to assess the reporting of methodological quality (16). Endoscopic diagnosis was considered the universal reference standard. The Fisher exact test was used to assess the differences between the reporting quality of the lower half and upper half scoring studies.
Data availability
Data were generated by the authors and were included in the main article and Supplementary Material.
Results
This study aimed to track the progress of blood biomarkers up to date by collating data from the previous (7) and the current review. A systematic literature search resulted in 44,771 results. Upon screening and duplicate removal 43 new articles met the inclusion criteria and were included (Fig. 1). These studies recruited 7,596 participants, between 2015 and 2020. Twenty-one studies included data from more than 100 patients, whereas one study recruited more than 1,000 patients. Most studies (42/43) were phase I biomarker discovery studies investigating a single cancer. Twelve studies investigated esophageal cancer (17–28), 7 gastric cancer (29–35), and 22 colorectal cancer (36–57). Two studies investigated more than one gastrointestinal malignancy (58, 59) (Table 1). The most common analytic instruments used were LC-MS (56%) and GC-MS (19%), with the remainder using a combination or alternative instruments (e.g., NMR). Across both reviews (2007–2020), a total of 72 studies studying circulating metabolic biomarkers of gastrointestinal cancer were identified.
. | Study . | Cancer . | Total n (Cancer n) . | Biomarker discovery . | Analytical platform(s) . | Final classification method . | Diagnostic indices of final method . | Significant features . | Features after MC . |
---|---|---|---|---|---|---|---|---|---|
Studies investigating esophageal malignancies | |||||||||
Liang 2015 | EAC | 60 (30) | Targeted | LC-MS/MS | 1 metabolite | Not reported | 1 | n/a | |
Bhatt 2016 | EAC | 39 (20) | Targeted | SIFT-MS | LRM (2 metabolites) | AUROC 0.83, Sens 84%, Spec 90% | 9 | 2a | |
Buas 2017 | EAC | 322 (100) | Targeted | LC-MS | MCCV MV model | AUROC 0.75 | 4 | 1a | |
White 2017 | EAC | 643 (320) | Targeted | LC-MS/MS | 1 metabolite | Not reported | 1 | n/a | |
Zhu 2017 | “EC” | 45 (24) | Targeted | GC-TOF-MS | PLS-DA MV model | AUROC 0.95, Sens 83.3%, Spec 100% | 22 | Individual P values not reported | |
Mir 2015 | ESCC | 50 (40) | Nontargeted | LC-QTOF-MS | 4 metabolites | Not reported | 652 | Individual P values not reported | |
Wang 2016 | ESCC | 202 (97) | Nontargeted | UHPLC-QTOF-MS | PLS-DA MV model | AUROC 0.90, Sens 85.0%, Spec 90.5% | 16 | 16 | |
Cheng 2017 | ESCC | 104 (76) | Targeted | UPLC-MS/MS | LRM (4 metabolites) | AUROC 0.87 | 4 | 3a | |
Cheng 2017 | ESCC | 67 (40) | Targeted | LC-MS/MS | LRM (4 metabolites) | AUROC 0.75 | 4 | 3a | |
Ma 2018 | ESCC | 66 (34) | Nontargeted | 2D LC-MS | Hierarchical clustering (120 metabolites) | Not reported | 120 | Individual P values not reported | |
Liu 2019 | ESCC | 65 (25) | Nontargeted | NMR | OPLS-DA MV model | AUROC 0.96 | 29 | 25a | |
Zhu 2020 | ESCC | 310 (140) | Nontargeted | UPLC-QTOF-MS | PCA & OPLS-DA MV model | AUROC 0.97, Sens 88.3%, 88.9% | 34 | 34 | |
Studies investigating gastric malignancies | |||||||||
Choi 2016 | GAC | 52 (35) | Targeted | LC-MS/MS | 5 metabolites | Not reported | 5 | 5a | |
Kuligowski 2016 | GAC | 143 (33) | Nontargeted | UPLC-TOF-MS | PLS-DA MV model | AUROC 0.88, Sens 76%, Spec 92% | 151 | Individual P values not reported | |
Lario 2017 | GAC | 80 (20) | Targeted | LC-MS | PCA & PLS-DA MV models | AUROC 0.83, Sens 80%, Spec 74% | 13 | 13 | |
Jing 2018 | GAC | 166 (84) | Targeted | LC-MS/MS | PLS-DA MV model | AUROC 0.92, Sens 85.5%, 89.1% | 13 | 5a | |
Liu 2018 | GAC | 162 (80) | Nontargeted | GC-MS | OPLS-DA MV model | AUROC 1.0 | 25 | Individual P values not reported | |
Xiu 2019 | GAC | 154 (104) | Targeted | UHPLC-MS/MS | 21 metabolites | Not reported | 21 | Individual P values not reported | |
Sun 2020 | GAC | 69 (21) | Nontargeted | UPLC-QTOF-MS | OPLS-DA MV model | AUROC 0.99, Sens 98%, Spec 95% | 27 | Individual P values not reported | |
Studies investigating colorectal malignancies | |||||||||
Chen 2015 | CRC | 40 (20) | Nontargeted | UPLC-QTOF-MS | PLS-DA MV model | Not reported | 20 | Individual P values not reported | |
Crotti 2016 | CRC | 98 (63) | Nontargeted | GC-TOF-MS | 1 metabolite | AUROC 0.82, Sens 87.8%, Spec 80% | 4 | 1a | |
Deng 2016 | CRC | 127 (28) | Nontargeted | NMR & LC-MS/MS | MCCV-PLS-DA MV model | NMR: AUROC 0.84, LC-MS/MS: AUROC 0.93 | NMR: 6, LC-MS/MS: 17 | NMR: 3, LC-MS/MS: 10 | |
Farshidfar 2016 | CRC | Discovery: 378 (222) | Nontargeted | GC-MS | OPLS-DA MV model | AUROC 0.91, Sens 85%, Spec 86% | 28 | 14a | |
Validation: 165 (98) | |||||||||
Zhang 2016 | CRC | 324 (139) | Targeted | CBDI-nanoESI FTICR-MS | 4 metabolites | AUROC 0.93, Sens 84.6%, Spec 89.8% | 6 | 5a | |
Hata 2017 | CRC | 1141 (225) | Targeted | FIA-MS/MS | LRM (1 metabolite) | AUROC 0.91, Sens 83.3%, Spec 84.8% | 1 | 1 | |
Long 2017 | CRC | Discovery: 90 (30), | Nontargeted | LC-MS/MS | LRM (3 metabolites) | Not reported | 50 | 4a | |
Validation: 150 (5) | |||||||||
Mika 2017 | CRC | 36 (19) | Targeted | GC-MS | 1 metabolite | Not reported | 7 | Individual P values not reported | |
Nishiumi 2017 | CRC | 573 (282) | Nontargeted | GC-MS | LRM (29 metabolites) | AUROC 0.99, Sens 99.3%, Spec 93.8% | 41 | 29 | |
Separovic 2017 | CRC | 20 (10) | Targeted | LC-MS | 6 metabolites | Not reported | 6 | 6 | |
Shen 2017 | CRC | 35 (25) | Nontargeted | 2D LC-QTOF-MS | PCA MV model | AUROC > 0.9 | 64 | Individual P values not reported | |
Uchiyama 2017 | CRC | 175 (56) | Nontargeted | CE-TOF MS | PCA MV model | AUC 0.74–0.89 | 4 | 4 | |
Zhang 2017 | CRC | 35 (25) | Targeted | LC-MS & UPLC-MS/MS | 5 metabolites | Not reported | 5 | Individual P values not reported | |
Asante 2018 | CRC | 36 (26) | Nontargeted | LC-MS | PCA MV model | Not reported | 30 | 22 | |
Farshidfar 2018 | CRC | 174 (62) | Nontargeted | FIA-MS/MS | OPLS-DA MV model | AUROC 0.98, Sens 93%, Spec 95% | 48 | 32a | |
Liu 2018 | CRC | 55 (25) | Nontargeted | GC-MS | PCA, PLS-DA & OPLS-DA MV models | AUROC 0.96–1.00 | 25 | 25 | |
Messias 2018 | CRC | 41 (23) | Nontargeted | ESI-QTOF-MS & GC-FID | PCA & (O)PLS-DA MV models | Not reported | 15 | Individual P values not reported | |
Shu 2018 | CRC | 495 (250) | Nontargeted | GC-TOF-MS & UPLC-QTOF-MS | LRM (9 metabolites) | AUROC 0.76 | 35 | 35 | |
Wood 2018 | CRC | 141 (67) | Targeted | ESI-Orbitrap-MS | 1 metabolite | Not reported | 1 | n/a | |
Gu 2019 | CRC | 110 (40 | Nontargeted | NMR | PLS-DA MV model | AUROC 0.83 | 19 | Individual P values not reported | |
Serafim 2019 | CRC | 84 (40) | Nontargeted | MALDI-MS | PLS-DA MV model | AUROC 0.92 | 15 | Individual P values not reported | |
Wu 2020 | CRC | 90 (45) | Nontargeted | GC-MS | PLS-DA MV model | AUROC 0.81 | 8 | 0a | |
Studies investigating multiple malignancies | |||||||||
Lee 2019 | CRC | 36 (16) | Targeted | UPLC-ESI-MS/MS | PCA MV model | AUROC 0.86 | 10 | Individual P values not reported | |
GAC | 40 (20) | Targeted | UPLC-ESI-MS/MS | PCA MV model | AUROC 0.91 | 16 | Individual P values not reported | ||
Zhang 2019 | CRC | 59 (21) | Targeted | UPLC-ESI-MS/MS | PLS-DA MV model | AUROC 0.92–0.97 | 70 | Individual P values not reported | |
GAC | 49 (11) | Targeted | UPLC-ESI-MS/MS | PLS-DA MV model | AUROC 0.85–0.98 | 81 | Individual P values not reported |
. | Study . | Cancer . | Total n (Cancer n) . | Biomarker discovery . | Analytical platform(s) . | Final classification method . | Diagnostic indices of final method . | Significant features . | Features after MC . |
---|---|---|---|---|---|---|---|---|---|
Studies investigating esophageal malignancies | |||||||||
Liang 2015 | EAC | 60 (30) | Targeted | LC-MS/MS | 1 metabolite | Not reported | 1 | n/a | |
Bhatt 2016 | EAC | 39 (20) | Targeted | SIFT-MS | LRM (2 metabolites) | AUROC 0.83, Sens 84%, Spec 90% | 9 | 2a | |
Buas 2017 | EAC | 322 (100) | Targeted | LC-MS | MCCV MV model | AUROC 0.75 | 4 | 1a | |
White 2017 | EAC | 643 (320) | Targeted | LC-MS/MS | 1 metabolite | Not reported | 1 | n/a | |
Zhu 2017 | “EC” | 45 (24) | Targeted | GC-TOF-MS | PLS-DA MV model | AUROC 0.95, Sens 83.3%, Spec 100% | 22 | Individual P values not reported | |
Mir 2015 | ESCC | 50 (40) | Nontargeted | LC-QTOF-MS | 4 metabolites | Not reported | 652 | Individual P values not reported | |
Wang 2016 | ESCC | 202 (97) | Nontargeted | UHPLC-QTOF-MS | PLS-DA MV model | AUROC 0.90, Sens 85.0%, Spec 90.5% | 16 | 16 | |
Cheng 2017 | ESCC | 104 (76) | Targeted | UPLC-MS/MS | LRM (4 metabolites) | AUROC 0.87 | 4 | 3a | |
Cheng 2017 | ESCC | 67 (40) | Targeted | LC-MS/MS | LRM (4 metabolites) | AUROC 0.75 | 4 | 3a | |
Ma 2018 | ESCC | 66 (34) | Nontargeted | 2D LC-MS | Hierarchical clustering (120 metabolites) | Not reported | 120 | Individual P values not reported | |
Liu 2019 | ESCC | 65 (25) | Nontargeted | NMR | OPLS-DA MV model | AUROC 0.96 | 29 | 25a | |
Zhu 2020 | ESCC | 310 (140) | Nontargeted | UPLC-QTOF-MS | PCA & OPLS-DA MV model | AUROC 0.97, Sens 88.3%, 88.9% | 34 | 34 | |
Studies investigating gastric malignancies | |||||||||
Choi 2016 | GAC | 52 (35) | Targeted | LC-MS/MS | 5 metabolites | Not reported | 5 | 5a | |
Kuligowski 2016 | GAC | 143 (33) | Nontargeted | UPLC-TOF-MS | PLS-DA MV model | AUROC 0.88, Sens 76%, Spec 92% | 151 | Individual P values not reported | |
Lario 2017 | GAC | 80 (20) | Targeted | LC-MS | PCA & PLS-DA MV models | AUROC 0.83, Sens 80%, Spec 74% | 13 | 13 | |
Jing 2018 | GAC | 166 (84) | Targeted | LC-MS/MS | PLS-DA MV model | AUROC 0.92, Sens 85.5%, 89.1% | 13 | 5a | |
Liu 2018 | GAC | 162 (80) | Nontargeted | GC-MS | OPLS-DA MV model | AUROC 1.0 | 25 | Individual P values not reported | |
Xiu 2019 | GAC | 154 (104) | Targeted | UHPLC-MS/MS | 21 metabolites | Not reported | 21 | Individual P values not reported | |
Sun 2020 | GAC | 69 (21) | Nontargeted | UPLC-QTOF-MS | OPLS-DA MV model | AUROC 0.99, Sens 98%, Spec 95% | 27 | Individual P values not reported | |
Studies investigating colorectal malignancies | |||||||||
Chen 2015 | CRC | 40 (20) | Nontargeted | UPLC-QTOF-MS | PLS-DA MV model | Not reported | 20 | Individual P values not reported | |
Crotti 2016 | CRC | 98 (63) | Nontargeted | GC-TOF-MS | 1 metabolite | AUROC 0.82, Sens 87.8%, Spec 80% | 4 | 1a | |
Deng 2016 | CRC | 127 (28) | Nontargeted | NMR & LC-MS/MS | MCCV-PLS-DA MV model | NMR: AUROC 0.84, LC-MS/MS: AUROC 0.93 | NMR: 6, LC-MS/MS: 17 | NMR: 3, LC-MS/MS: 10 | |
Farshidfar 2016 | CRC | Discovery: 378 (222) | Nontargeted | GC-MS | OPLS-DA MV model | AUROC 0.91, Sens 85%, Spec 86% | 28 | 14a | |
Validation: 165 (98) | |||||||||
Zhang 2016 | CRC | 324 (139) | Targeted | CBDI-nanoESI FTICR-MS | 4 metabolites | AUROC 0.93, Sens 84.6%, Spec 89.8% | 6 | 5a | |
Hata 2017 | CRC | 1141 (225) | Targeted | FIA-MS/MS | LRM (1 metabolite) | AUROC 0.91, Sens 83.3%, Spec 84.8% | 1 | 1 | |
Long 2017 | CRC | Discovery: 90 (30), | Nontargeted | LC-MS/MS | LRM (3 metabolites) | Not reported | 50 | 4a | |
Validation: 150 (5) | |||||||||
Mika 2017 | CRC | 36 (19) | Targeted | GC-MS | 1 metabolite | Not reported | 7 | Individual P values not reported | |
Nishiumi 2017 | CRC | 573 (282) | Nontargeted | GC-MS | LRM (29 metabolites) | AUROC 0.99, Sens 99.3%, Spec 93.8% | 41 | 29 | |
Separovic 2017 | CRC | 20 (10) | Targeted | LC-MS | 6 metabolites | Not reported | 6 | 6 | |
Shen 2017 | CRC | 35 (25) | Nontargeted | 2D LC-QTOF-MS | PCA MV model | AUROC > 0.9 | 64 | Individual P values not reported | |
Uchiyama 2017 | CRC | 175 (56) | Nontargeted | CE-TOF MS | PCA MV model | AUC 0.74–0.89 | 4 | 4 | |
Zhang 2017 | CRC | 35 (25) | Targeted | LC-MS & UPLC-MS/MS | 5 metabolites | Not reported | 5 | Individual P values not reported | |
Asante 2018 | CRC | 36 (26) | Nontargeted | LC-MS | PCA MV model | Not reported | 30 | 22 | |
Farshidfar 2018 | CRC | 174 (62) | Nontargeted | FIA-MS/MS | OPLS-DA MV model | AUROC 0.98, Sens 93%, Spec 95% | 48 | 32a | |
Liu 2018 | CRC | 55 (25) | Nontargeted | GC-MS | PCA, PLS-DA & OPLS-DA MV models | AUROC 0.96–1.00 | 25 | 25 | |
Messias 2018 | CRC | 41 (23) | Nontargeted | ESI-QTOF-MS & GC-FID | PCA & (O)PLS-DA MV models | Not reported | 15 | Individual P values not reported | |
Shu 2018 | CRC | 495 (250) | Nontargeted | GC-TOF-MS & UPLC-QTOF-MS | LRM (9 metabolites) | AUROC 0.76 | 35 | 35 | |
Wood 2018 | CRC | 141 (67) | Targeted | ESI-Orbitrap-MS | 1 metabolite | Not reported | 1 | n/a | |
Gu 2019 | CRC | 110 (40 | Nontargeted | NMR | PLS-DA MV model | AUROC 0.83 | 19 | Individual P values not reported | |
Serafim 2019 | CRC | 84 (40) | Nontargeted | MALDI-MS | PLS-DA MV model | AUROC 0.92 | 15 | Individual P values not reported | |
Wu 2020 | CRC | 90 (45) | Nontargeted | GC-MS | PLS-DA MV model | AUROC 0.81 | 8 | 0a | |
Studies investigating multiple malignancies | |||||||||
Lee 2019 | CRC | 36 (16) | Targeted | UPLC-ESI-MS/MS | PCA MV model | AUROC 0.86 | 10 | Individual P values not reported | |
GAC | 40 (20) | Targeted | UPLC-ESI-MS/MS | PCA MV model | AUROC 0.91 | 16 | Individual P values not reported | ||
Zhang 2019 | CRC | 59 (21) | Targeted | UPLC-ESI-MS/MS | PLS-DA MV model | AUROC 0.92–0.97 | 70 | Individual P values not reported | |
GAC | 49 (11) | Targeted | UPLC-ESI-MS/MS | PLS-DA MV model | AUROC 0.85–0.98 | 81 | Individual P values not reported |
Note: Where studies included discovery and validation cohorts, diagnostic metrics of the validation set included for analysis.
Abbreviations: AUROC, area under receiver operating characteristic curve; CBDI-nanoESI FTICR-MS, chip-based direct-infusion nanoESI-Fourier transform ion cyclotron resonance mass spectrometry; CRC, colorectal adenocarcinoma; EAC, esophageal adenocarcinoma; ESCC, esophageal squamo-cellular carcinoma; ESI-TOFMS, electrospray ionization time-of-flight mass spectrometry; FIA-MS/MS, flow injection analysis-tandem mass spectrometry; GAC, gastric adenocarcinoma; GC-FID, Gas-Chromatography-Flame Ionization Detector; GC-MS, gas chromatography mass spectrometry; LDA, linear discriminant analysis; LRM, logistic regression model; MALDI-MS, Matrix-assisted laser desorption/ionization mass spectrometry; MC, multiplicity correction; MCCV, Monte Carlo Cross Validation; MV, multivariable; NMR, nuclear magnetic resonance spectroscopy; MS/MS, tandem mass spectrometry; OPLS-DA, orthogonal projection to latent structures discriminant analysis; PLS-DA, partial least squares discriminant analysis; PCA, principle component analysis; ROC, receiver operating characteristic curve; Sen, sensitivity; SIFT-MS, Selected-ion flow-tube mass spectrometry; Spec, specificity; UHPLC, ultra-high-performance liquid chromatography.
aWe applied a Bonferroni correction (a/n compared features).
Biomarker translation
Of the 77 biomarkers, only one was specifically validated in a separate study (38), meaning every other biomarker had not progressed beyond the discovery phase (Fig. 2A). Since 2007, 77 metabolic biomarkers (single metabolites or composite panels/signatures) were identified from 72 relevant publications. None are currently in clinical use, commercially available, or recommended in guidelines. Area under the receiver operating characteristic curves (AUROC) were provided for 41 biomarkers (53.2%). Of these, 25 (60%) had AUROC >0.9, but only 3 were patented. Among the 36 panels with no reported AUROC, 4 had been patented. Thus, 7 biomarkers (9.09%) had obtained patent protection, one was commercialized and subsequently withdrawn for unknown reasons (60, 61), and none were clinically approved, emphasizing the extent of the biomarker translational gap (Fig. 2B). Most patents (5/7, 71.4%) originated from universities/academic institutions whereas the two remaining patents originated from a biotechnology industry. In summary, there are currently no commercially available diagnostic tests for gastrointestinal cancers based on metabolic biomarkers, despite 7 being patent-protected, and a considerable proportion reporting AUROCs >0.9. Individual research groups appeared to be conducting multiple biomarker discovery studies, rather than validating or refining existing promising biomarkers (among 72 publications, eight groups were working on more than one biomarker, see Supplementary Table S3).
Biomarker progress along the development pathway
Despite only one biomarker panel being validated in a separate study (Fig. 2A), many individual metabolites were reported in more than one study. To establish reporting frequencies, metabolites reported within biomarker panels were treated individually and combined with single metabolite biomarkers. This generated 1,796 circulating metabolites reported since 2015. After Bonferroni adjustment, there were 281 significantly different metabolites from 28 publications (Supplementary Table S4). Many of these were reported previously (7). Figure 2C demonstrates the 15 most reported metabolites across both reviews (since 2007). From the metabolites shown in Fig. 2C, only valine, histidine, and tryptophan changed in the same direction in more than three cancer types (Supplementary Table S5). For the three most commonly reported metabolites—leucine/Isoleucine, histidine, and tryptophan—cumulative reporting frequency was plotted over time, for each of the cancer subtypes (Fig. 3). Patent date was also plotted where available. There was steady accumulation of reporting frequency for a number of these pairs of cancer subtype and biomarker, with 5 pairs being reported at least 4 times.
Longitudinal analysis of methodologic and reporting quality
There was wide variation in reported sample procurement and preparation methods as noted in the previous review (Table 1). All studies were conducted to evaluate potential diagnostic biomarkers, but only 29 (67%) reported diagnostic accuracy metrics such as sensitivity, specificity and/or AUROC. Most (26/43) of these derived diagnostic indices were directly derived from a multivariate model, usually partial least squares regression or logistic regression (Table 1). This reflects an improvement on our previous review, in which only 56% reported diagnostic accuracy figures (7).
The risk of methodological bias was often unclear (assessed using QUADAS-2, Supplementary Table S2 and Supplementary Fig. S1). Studies often did not report whether participants were recruited consecutively (patient selection) or whether the index test results were interpreted without knowledge of the results of the reference standard (index test). Only one study reported the time interval between the index test and reference standard (flow and timing). However, there were few applicability concerns in terms of patient selection, index test, and reference standard. This was broadly similar to the previous review.
The reporting quality of papers after 2015, was medium-to-poor (assessed using STARD, median score 17/34, see Supplementary Table S6). No study achieved all items. There was limited reporting of how indeterminate results were handled, how intended sample sizes were calculated, and the descriptions of the participants. Studies with higher STARD scores were more likely to report on the timing of data collection and recruitment, blinding and methods for estimating diagnostic accuracy (P < 0.005). This was broadly similar to the previous review.
Reporting quality was also assessed using the CAWG-MSI tool (Supplementary Table S7; Supplementary Fig. S2). The CAWG-MSI provides a consensus minimum reporting standards in metabolomics, focusing primarily on NMR and MS techniques. The majority of studies (>90%) reported their chosen extraction methods and instrument parameters, although fewer than 20% of studies reported method validation parameters (e.g., accuracy, precision, and limits of detection/quantification). Most studies (67%) did not perform level 1 compound identification using authentic standards, preferring to use spectral libraries, and chemical class similarity for annotation. 49% omitted calibration curves (Supplementary Table S8). To test whether potentially impactful biomarkers were associated with higher quality methodology, the CAWG-MSI and STARD scores of patented and non-patented studies were compared with the Mann–Whitney U test (see Supplementary Table S9). There were no significant differences, implying this small number of patented studies were not more likely to be thoroughly reported.
Promising candidates and current gaps to implementation
Esophago-gastric cancer
Twelve new studies investigated esophageal cancer [4 esophageal adenocarcinoma, 7 esophageal squamous cell carcinoma (ESCC), and 1 mixed], with 3 studies reporting AUROC > 0.95. For example, in ESCC, Zhu and colleagues (27) identified a novel panel of 34 metabolites using logistic regression analysis in a cohort of 310 participants that could accurately diagnose ESCC in both training and validation sets (AUROC > 0.96). Biomarkers have been identified in more than one study, such as significantly increased fatty acids (linoleic acid and palmitoleic acid; refs. 25, 27) and reduced amino acids (alanine and tryptophan) that were reported in 3 separate ESCC studies (17, 25, 27). In esophageal adenocarcinoma, there were additional disease-associated metabolites compared with the previous review. For example, Buas and colleagues (19) reported that urate could strongly differentiate high grade dysphasia (HGD)/ esophageal adenocarcinoma from Barrett's esophagus (Fold change 1.11, P = 0.0002, FDR q value = 0.01). Moreover, Bhatt and colleagues (28) identified volatile serum metabolites by SIFT-MS and identified two metabolites that could accurately diagnose esophageal adenocarcinoma (AUROC 0.83).
Including multicancer studies, 9 new studies investigated gastric cancer with promising new metabolite associations and 3 studies reported AUROC > 0.95 (29, 31, 34). Reduced tryptophan levels have been reported in 3 independent GC studies (30, 31, 33). Jing and colleagues (31) evaluated a large cohort of 166 participants and reported a panel of 5 amino acids that could accurately distinguish patients with gastric ulcer from gastric cancer (sensitivity 86%, specificity 89%, AUROC 0.92). Lario and colleagues (33) also identified a panel of tryptophan and phenylalanine metabolites that could accurately discriminate gastric cancer from precursor lesions with a sensitivity and specificity of 80% and 74%, respectively.
In summary, 30.6% (11/36) esophago-gastric biomarker panels reported high diagnostic accuracy (AUROC > 0.9), but only 1 of these was patented. AUROC was not reported for 17/36 panels, of which 1 was patented. None of the biomarkers without patents have been externally validated or have validation trials registered on ClinicalTrials.gov.
Colorectal cancer
A total of 24 new studies investigated colorectal cancer. From 2007 to 2020, amino acid metabolism was the most frequently reported deregulated domain in patients with colorectal cancer, with reduced tryptophan and valine reported in 5 studies (47, 62–65) and histidine in 7 independent studies (refs. 40, 41, 52, 63–66; typically with reduced concentrations in the cancer state). Several new studies included highly accurate diagnostic models, including 6 studies reporting AUROC > 0.95. For example, Nishiumi and colleagues (47) followed up their 2012 study with a refined diagnostic model for early-stage cancer applied in 573 participants. Although this requires further validation, an eight-metabolite model was derived via multiple logistic regression analysis with high sensitivity and specificity (sensitivity: 99.3%, specificity: 93.8% and AUROC: 0.99). Furthermore, Farshidfar and colleagues (38) derived a semiquantitative GC-MS metabolomic signature from a discovery cohort of 378 participants that displayed high accuracy in an independent validation group (AUROC 0.91). The same group then later applied a quantitative flow injection analysis-tandem MS method with wider coverage to derive a 48-metabolite signature that demonstrated robust performance in both training and external validation sets (AUROC 0.98; ref. 37).
In summary, 34.1% (14/41) colorectal biomarkers reported high diagnostic accuracy (AUROC > 0.9), but only 2 of these were patented. AUROC was not reported for 46.3% (19/41) biomarkers, of which 3 were patented. None of the biomarkers without patents have been externally validated or have validation trials registered on ClinicalTrials.gov. From 2007 to 2020 only a single biomarker was commercialized (8) that was subsequently withdrawn from the market for unknown reasons (Cologic, Phenomenome Discoveries; refs. 60, 61).
Discussion
This study's objective was to describe the barriers to clinical implementation of serum metabolic biomarkers in gastrointestinal cancer. The main finding is that large numbers of candidate biomarkers have been discovered, and many with excellent diagnostic performance in early-discovery studies. For example, in patients with upper gastrointestinal cancer, there have been 11 studies that reported metabolomic blood tests with an AUROC > 90%. In regard to lower gastrointestinal cancer, 14 studies reported an AUROC >90%, suggesting a significantly better potential performance than the currently available pre-endoscopy test gold standard (fecal immunochemical test; ref. 67). Several studies also suggest that systemic metabolomic differences may accurately discriminate early stages of cancer (17, 20, 25, 29, 42, 47, 52, 55), normalize with cancer resection (68), and display consistent results between Western and Asian populations (36, 39) that provides biological and translational assurance. This emphasizes the void between a real clinical need (currently completely unaddressed in upper gastrointestinal cancer), and numerous seemingly effective metabolic biomarkers that have not matured from the first step.
We identified 72 studies published since 2007 that recruited 18,431 patients in total. Certain discriminating metabolites have been consistently reported in independent studies, suggesting robust performance as diagnostic biomarkers. Histidine, for example, has been “discovered” in 7 publications evaluating colorectal cancer biomarkers, however, has never had a deliberate subsequent validation as a result of one of those discoveries.
A number of the repeatedly discovered biomarkers were identified to be up- and downregulated in different articles studying the same cancer type (Supplementary Table S5). This suggests that they may not be useful biomarkers of disease. Valine, histidine, and tryptophan were consistently downregulated in two cancer types, suggesting their importance as potential biomarkers requiring further investigation.
Patent status is an indication of biomarker success, as it builds toward biomarker clinical implementation (69). Because of the lack of clinically adopted biomarkers, patent status was used in a composite measure of clinical biomarker adoption, and patent protection was identified for a small number of the biomarkers (9.09%). However, patented biomarkers were not more likely to (i) arise from an independently funded study, (ii) be independently validated, or (iii) have higher general or analytic-specific reporting quality. Moreover, only one biomarker underwent further validation in a separate study. As indicated in Fig. 2A, out of 77 metabolites/metabolite panels reported, only one was separately validated in an independent study. No publication reported negative results, suggesting publication bias. This compounds difficulty in interpreting the value of repeatedly discovered biomarkers, which may not have been discovered in number of further studies with appropriate analytics. An open-access repository for both positive and negative data might help overcome this and progress development in this field.
As quantitatively supported in this review, there is an evident lack of external validation of biomarkers. Reasons for this include poor prioritization of resources, a perceived limited translational capacity for the underlying analytic technique, and a perceived poor performance of the discovered biomarker(s) (10, 69). Metabolomics is fortunate in that relevant analyzers are standard NHS equipment in tertiary care centers, and thus is more translatable than other molecular profiling techniques. In addition, a number of the biomarkers had exceptional performance at the discovery stage. Thus, future resources should be channeled away from further discovery, and on to targeted studies that improve performance through analytic refinement and/or augmentation strategies, and then seek appropriate external validation with adequate power and clinically relevant controls. Moreover, as seen in Fig. 2B only 9.09% of the reported biomarkers currently have a patent status. This pattern is also supported by Drucker and Krapfenbauer (69), denoting that clinical utilization of biomarkers remains a daunting task.
Four articles were identified that included more than one cancer type (58, 59, 63, 70). All articles used forms of multiple classification analysis to identify discriminating metabolites specific to each cancer. Miyagi and colleagues (63), for example, found that metabolites could discriminate between cancer types, but less accurately than between cancer and healthy controls (accuracy range, 46%–62.3%). The diagnostic performance of these metabolite panels, however, in distinguishing a specific cancer type from other cancers and healthy controls combined, was not reported in these reviews. Therefore, there is a need for larger validation studies comparing different diseases and also different stages of disease, as well as better standardization to permit meaningful meta-analysis.
Accurate methodological reporting facilitates experimental standardization and enables data re-interrogation and comparison (16). Upon reviewing current reporting methodology, several gaps were identified. Methodology of population enrolment and timing between sampling and analysis was unclear in most studies assessed, based on QUADAS-2 (Supplementary Table S2; Supplementary Fig. S1). Publication reporting quality, assessed using STARD, was medium-to-poor (median score 17/34, see Supplementary Table S6). Aspects, including handling of indeterminate results, sample size calculation and participant description were not reported. Overall, the aggregate QUADAS-2 and STARD scores were similar in this update, compared with the previous review. These checklists enhance study quality without adding cost and their use in research design is encouraged. However, there was a small improvement in the proportion of studies reporting familiar diagnostic accuracy metrics for their diagnostic biomarkers.
Fewer than 20% of publications reported aspects of method validation (e.g., accuracy, precision and limits of detection/quantification), suggesting the need to establish analytic validity and methodological accuracy from an early stage during biomarker discovery. Similarly, only 33% of studies verified metabolite identity using authentic standards (i.e., CAWG-MSI Level 1 identification), suggesting lack of confident metabolite annotation from spectral data (Supplementary Fig. S2). These aspects reduce confidence in biomarker quality and weaken the argument for implementation. The exploratory nature of nearly all 72 studies, lack of comparison against peer tests (e.g., FIT for colorectal cancer), and lack of primary-care testing are further limitations of the reviewed literature. Additional problems include a lack of clinical utility assessment, cost analyses and repeatable assay validation. The platform utilized to evaluate the level of the biomarker is also of great importance. Although MS/NMR platforms can provide high-sensitivity assays, they suffer from a variety of challenges mainly associated with cost, complexity of data, variable metabolomic coverage, and data acquisition (71, 72).
Several recommendations have been made aiming to improve the gap in the field of diagnostics. However, these recommendations are highly applicable to other areas, as they address important issues in biomarker research. Some of the recommendations highlighted include: (i) improvement of diagnostic affordability and acceptability, (ii) adoption of appropriate technology that could be widely used, and (iii) adopting diagnostics based on the needs of fragile populations. Evidently, by focusing on the factors recommended by Fleming and colleagues (73), other than reducing the diagnostic gap, several fields, including prognostic/predictive biomarkers, will benefit.
The chief limitations of this review are the relatively strict exclusion criteria set to avoid multiplicity error, publication bias, and that the metabolite reporting is skewed toward those that are easily quantified by profiling methods (easily ionized, high concentration etc.). In addition, using patent status in the composite for biomarker success may not necessarily highlight the most robust biomarkers, as identified by the indifferent quality scores.
Disconnected communication between scientists, clinicians, industry and end users provides a major barrier in biomarker translation (74). A fit-for-purpose analytic methodology needs to be established from an early stage of the pipeline. To achieve this, clear lines of communication are needed between relevant stakeholders. There is an evident need to build on biomarker discovery and proceed to well-defined assay validation, feasibility, and clinical utility studies to remove the blockages from the biomarker pipeline, which is currently overloaded with exploratory studies and stalled discoveries. Thus, more promising biomarkers could be filtered and formally evaluated for clinical adoption.
Authors' Disclosures
B. Das reports grants from Medical Research Council during the conduct of the study. No disclosures were reported by the other authors.
Acknowledgments
We would like to show our gratitude to S. Antonowicz, C.J. Peters, B.G. Hanna, and London In Vitro Diagnostic Co-operative for their support throughout this study. This study was funded by Imperial College London.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).