Abstract
Advances in analytics have resulted in metabolomic blood tests being developed for the detection of cancer. This systematic review aims to assess the diagnostic accuracy of blood-based metabolomic biomarkers for endoluminal gastrointestinal (GI) cancer. Using endoscopic diagnosis as a reference standard, methodologic and reporting quality was assessed using validated tools, in addition to pathway-based informatics to biologically contextualize discriminant features. Twenty-nine studies (15 colorectal, 9 esophageal, 3 gastric, and 2 mixed) with data from 10,835 participants were included. All reported significant differences in hematologic metabolites. In pooled analysis, 246 metabolites were found to be significantly different after multiplicity correction. Incremental metabolic flux with disease progression was frequently reported. Two promising candidates have been validated in independent populations (both colorectal biomarkers), and one has been approved for clinical use. Networks analysis suggested modulation of elements of up to half of Edinburgh Human Metabolic Network subdivisions, and that the poor clinical applicability of commonly modulated metabolites could be due to extensive molecular interconnectivity. Methodologic and reporting quality was assessed as moderate-to-poor. Serum metabolomics holds promise for GI cancer diagnostics; however, future efforts must adhere to consensus standardization initiatives, utilize high-resolution discovery analytics, and compare candidate biomarkers with peer nonendoscopic alternatives. Cancer Epidemiol Biomarkers Prev; 25(1); 6–15. ©2015 AACR.
Introduction
Gastrointestinal (GI) cancer remains an enormous healthcare burden worldwide. Alarm symptoms are often indicative of advanced disease, and hence presymptomatic detection constitutes ideal practice (1, 2). Currently, there are no FDA-approved tools for esophago-gastric cancer screening. In colorectal cancer, primary colonoscopy remains the most popular screening approach in at-risk groups, despite a high relative financial cost and associated clinical risk (3–5). Nonetheless, one third of at-risk patients remain unscreened (3). This unmet clinical need has led to the development of nonendoscopic screening tools that may be more acceptable to patients and could enhance early diagnosis; these range from breath testing for esophago-gastric cancer (6) to the recently approved fecal multitarget DNA testing for colorectal cancer (7, 8).
Metabolomics involves the study of small molecules (<1 kDa, i.e., metabolites) within a tissue or biofluid, most commonly utilizing mass spectrometry (MS) or nuclear magnetic resonance (NMR) spectroscopy (9). When applied to bio-samples, this analytical process seeks to characterize the closest molecular element to the phenotype in the gene–protein–metabolite cascade (10). High-resolution metabolic snapshots attainable in timeframes of seconds-to-minutes, and tessellation of proven biomarkers to portable nanosensor platforms is often realistic (11–13). Moreover, being inherently small molecules, metabolites are more likely to be distributed to distant bodily compartments than nucleic acids or proteins, better supporting noninvasive testing (14). This illustrates the potential of metabolomic biomarkers over other molecular signatures—point-of-care, noninvasive diagnostics, prognostics, treatment monitoring, and surveillance (9, 15).
Despite more than 10 years of metabolomic research in cancer, this potential is yet to be fully realized for GI malignancies (16, 17). Tissue metabolomics offers more rapid diagnosis than conventional histology, yet it still requires invasive procedures for tissue acquisition, and cannot offer the depth of information of immunophenotyping (18, 19). Breath and urine analysis represents noninvasive clinical paradigms, but current work is limited to phase I biomarker studies with emphasis on compound identification rather than clinical applicability (16, 20, 21). Additionally, if no evidence of a biochemical mechanism can be elicited (e.g., corresponding hematologic perturbations), introducing tests based on these dislocated surrogates will be impeded. Finally, metabolomic methods often generate large data sets, which are subjected to complex statistics to overcome data-redundancy problems such as multicity error. However, these data are often poorly presented for the clinical audience despite consensus statements and recommendations designed to clarify and standardize the reporting of metabolomics (22–26).
Progress in GI cancer blood tests derived from systems biology has been extensively reviewed (27, 28). However, a blood test that assigns risk of colorectal cancer has recently been approved for clinical use in Canada, discovered and developed using the metabolomics approach (29, 30). As this field was omitted from earlier analyses, the aim of this study was to systematically assess available literature reporting hematologic metabolomic biomarkers from a clinical perspective. Taking advantage of endoscopy as a single reference standard, each contribution was evaluated with validated tools for appraising methodologic and reporting quality.
Materials and Methods
Two authors (S. Antonowicz and T. Wiggins) independently performed a comprehensive literature review using the search terms “*esophagus,” “stomach,” “small intestine,” “colon,” “rectum,” “neoplasm,” “metabo*omics,” “mass spectrometry,” “magnetic resonance spectroscopy,” “metabotyping,” and “metabolic profiling,” together with the medical subject heading (MeSH) terms “esophagus [Mesh],” “stomach [Mesh],” “small intestine [Mesh],” “large intestine [Mesh],” “neoplasm [Mesh],” and “metabolomics [Mesh].” Terms were combined with the Boolean operators AND or OR. Medline, Web of Science, The Cochrane database, Embase, and citation indices to July 1, 2015 were searched. Titles, abstracts, and full-texts were processed using a PRISMA algorithm (see Fig. 1).
Selection criteria
Studies were included if they used an NMR or MS method to quantify and compare hematologic metabolites of patients with endoluminal cancer and controls. Studies using other analytical approaches, nonblood biospecimens, in vitro methodology, and non-English language full text, and efforts solely based on other ‘-omics’ sciences were excluded, in addition to two nested case–control series from epidemiologic studies (31, 32) and one which had serially sampled a single cohort (33). Given the intention to critically appraise on a basis of a diagnostic accuracy, these latter three were excluded as either the timing of the index test and reference standard were either too far apart (31, 32), or because no nondisease cohort was tested (33). In addition, hepatobiliary and pancreatic cancers were also excluded as they (i) have different, complex diagnostic gold standards, (ii) can arise in profoundly different metabolic states (e.g., hepatitis and liver failure), (iii) are less clearly driven by luminal carcinogens and microbiome interactions. Authors of conference abstracts that met inclusion criteria were contacted for relevant information.
Data extraction
A final list of 29 studies for data extraction was agreed. Two authors undertook data extraction and scoring; discrepancies were settled with a third author (S. Kumar). Domains included:
Institution(s) and dates of research
Hypothesis
Number of patients, controls, samples; type of cancer(s)
Metabolomic platform, global or targeted approach
Sample collection and preparation procedures
Statistical analysis
Results, including diagnostic accuracy
Conclusions and applicability
The primary outcome measure for extraction was any clinically relevant diagnostic metrics [sensitivity/specificity and/or area under receiver operating characteristic curve (AUROC)] of the principle biomarker(s) or biomarker composite. These measures were solely extracted with respect to primary diagnosis (i.e., cancer or no cancer), rather than surveillance or treatment monitoring. Additionally, metrics describing multivariable modeling performance were also extracted, as were significantly discriminating metabolites.
To illustrate how widely the individual malignancies interact with the systemic metabolome, metabolites were pooled and networked to Edinburgh Human Metabolic Network (EHMN) subdivisions. Primarily, a Bonferroni correction was applied to those studies that had chosen not to account for multiplicity error (α/n, where n is the total identified metabolic features in that study). Those biomarkers that remained significant were pooled, and cancer-specific metabolite relationships were mapped to EHMN subdivisions using Kyoto Encyclopaedia of Genes and Genomes Compound identifiers, using the “Compound-Gene” function of the Metscape 3.2 plugin in Cytoscape (v3.0.2, National Resource of Network Biology, http://cytoscape.org; refs. 34–36). However, pooled analysis of clinical metrics was not felt to be appropriate given the heterogeneity in analytics.
Study quality
Methodologic and reporting quality was assessed using the QUADAS-2 tool and STARD checklist, respectively (37, 38), taking endoscopic diagnosis as a universal reference standard. Reporting quality of the lower-half scoring studies was compared with the upper-half, and differences assessed with Fisher exact test. In an attempt to standardize the assessment of the risk of methodologic bias in the index test design, a simple 3-point score developed using consensus statements and applied as a subcomponent of the QUADAS-2 tool (23, 24). This looked for evidence of the use of internal standards or equivalent (metabolite quantification), external standards or equivalent (metabolite identification), or quality control measures; if any were omitted, the study was considered prone to bias.
Results
The search strategy returned 29 studies, which included blood samples from 10,835 participants between 2007 and 2014 (see Tables 1 and 2 for methods and results, respectively). Sixteen studies included data from more than 100 patients; two studies recruited more than 1,000 patients.
Ref. . | Study . | Cancer . | Total n (Cancer n) . | Biomarker discovery . | Analytical platform(s) . | Final classification method . | Diagnostic indices of final method . | Significant features . | Features after MC . |
---|---|---|---|---|---|---|---|---|---|
Studies investigating esophageal malignancies | |||||||||
(39) | Djukovic 2010 | EAC | 26 (14) | Targeted | UPLC-TQMS | 8 metabolites | Not reported | 8 | 5a |
(40) | Zhang 2011 | EAC | 118 (68) | Nontargeted | NMR | PLS-DA MV model | AUROC 0.89, Sen 88%, Sp 92% | 8 | 8 |
(41) | Zhang 2012 | EAC | 113 (67) | Nontargeted | LC-MS (& NMR) | PLS-DA MV model | AUROC 0.950, Sen 89%, Sp 90% | 12 + 8 | 13a |
(42) | Zhang 2013 | “EC” | 50 (25) | Nontargeted | NMR & UPLC -diode array | OPLS-DA MV model | Not reported | NMR > 25 UHPLC = 7 | NMR = 12 UHPLC = 7 |
(43) | Hasim 2012 | ESCC | 148 (108) | Nontargeted | 1 h-NMR | OPLS-DA MV model | Not reported | P values not reported | P values not reported |
(44) | Liu 2013 | ESCC | 152 (72) | Nontargeted | UPLC-ESI-TOFMS | 15 metabolites | Not reported | 15 | 11 |
(45) | Xu 2013 | ESCC | 228 (124) | Nontargeted | RRLC/ESI-MS | LRM (7 metabolites) | AUROC 0.961, Sen 90%, Sp 96% | 18 | 3a |
(46) | Jin 2014 | ESCC | 110 (80) | Nontargeted | GC-MS | OPLS-DA MV model | Not reported | 43 | 39a |
(47) | Ma 2014 | ESCC | 111 (51) | Targeted | HPLC | PLS-DA MV model | Not reported | 12 | 11a |
Studies investigating gastric malignancies | |||||||||
(48) | Yu 2011 | GAC | 28 (9) | Nontargeted | GC-MS | PLS-DA MV model | Not reported | 12 | 3a |
(49) | Aa 2012 | GAC | 37 (17) | Nontargeted | GC-MS | OPLS-DA MV model | Not reported | 15 | 0a |
(50) | Song 2012 | GAC | 60 (30) | Nontargeted | GC-MS | OPLS-DA MV model | Not reported | 18 | 6a |
Studies investigating colorectal malignancies | |||||||||
(51) | Zhao 2007 | CRC | 258 (133) | Nontargeted | LC-MS | LRM (4 metabolites) | Sen 82%, Sp 93% | 10 | 8a |
(52) | Qiu 2009 | CRC | 129 (64) | Nontargeted | GC-MS; UPLC-MS | OPLS-DA MV model | Not reported | 38 | 16a |
(53) | Ludwig 2009 | CRC | 57 (38) | Nontargeted | 1 H-NMR | PCA MV Model | Sen “about 70%” | P values not reported | P values not reported |
(29) | Ritchie 2010 | CRC | 443 (223) | Nontargeted | FTICR-MS, NMR, UPLC MS/MS | 3 metabolites | AUROC for each 0.97 | 50 | 50a |
(54) | Kondo 2011 | CRC | 50 (42) | Nontargeted | GC-MS | PLS-DA MV model | Not reported | 9 | 0a |
(55) | Leichtle 2011 | CRC | 117 (59) | Targeted | ESI-MS/MS | LRM (2 metabolites and CEA) | AUROC 0.88 | 11 | 11 |
(56) | Ma 2012 | CRC | 38 (30) | Nontargeted | GC-MS | Hierarchical clustering (6 metabolites) | Sen 94% | 6 | 2a |
(57) | Nishiumi 2012 | CRC | 242 (119) | Nontargeted | GC-MS | LRM (4 metabolites) | AUROC 0.91, Sen 85%, Sp 85% | 76 | 46a |
(58) | Li 2013 | CRC | 104 (52) | Nontargeted | DI±ESI FTICR-MS | LRM (10 metabolites) | AUROC 0.99, Sen 98%, Sp 100% | P values not reported | P values not reported |
(59) | Li 2013 | CRC | 360 (120) | Nontargeted | LC-MS/MS | LRM (4 metabolites) | Sen 89% Sp 80% | 16 | 16 |
(30) | Ritchie 2013 | CRC | 5,883 | Targeted | MS/MS TQMRM | 1 metabolite | Sen 85.7%, Sp 53%b | 1 | n/a |
(60) | Tan 2013 | CRC | 204 (102) | Nontargeted | GC-TOFMS & UPLC-QTOFMS | OPLS-DA MV model | Sen 100%, Sp 100% | 72 | 62a |
(61) | Wang 2014 | CRC | 36 (16) | Nontargeted | SPME-GC/MS | PLS-DA MV model | Not reported | 4 | 4 |
(62) | Zamani 2014 | CRC | 66 (33) | Nontargeted | 1 H-NMR | OPLS-DA MV model | Not reported | P values not reported | P values not reported |
(63) | Zhu 2014 | CRC | 234(66) | Targeted | LC-MS/MS | PLS-DA MV model | AUROC 0.93, Sen 96%, Sp 80% | 42 | 7a |
Studies investigating multiple malignancies | |||||||||
(64) | Miyagi 2011 | GAC | 1,383 (199) | Targeted | HPLC-ESI-MS | LDA | AUROC 0.82–0.85 | 6 | 6 |
CRC | 1,383 (199) | LDA | AUROC 0.87–0.88 | 10 | 10 | ||||
(65) | Ikeda 2012 | ESCC | 50 (15) | Nontargeted | GC-MS | 2 metabolites | Sen 80%–81%, Sp 59%–90% | 9 | 1a |
GAC | 50 (11) | 2 metabolites | Sen 70%–84%, Sp 71%–90%; | 5 | 0a | ||||
CRC | 50 (12) | 3 metabolites | Sen 54.5%–81.8%, Sp 66.7%–91.6% | 12 | 0a |
Ref. . | Study . | Cancer . | Total n (Cancer n) . | Biomarker discovery . | Analytical platform(s) . | Final classification method . | Diagnostic indices of final method . | Significant features . | Features after MC . |
---|---|---|---|---|---|---|---|---|---|
Studies investigating esophageal malignancies | |||||||||
(39) | Djukovic 2010 | EAC | 26 (14) | Targeted | UPLC-TQMS | 8 metabolites | Not reported | 8 | 5a |
(40) | Zhang 2011 | EAC | 118 (68) | Nontargeted | NMR | PLS-DA MV model | AUROC 0.89, Sen 88%, Sp 92% | 8 | 8 |
(41) | Zhang 2012 | EAC | 113 (67) | Nontargeted | LC-MS (& NMR) | PLS-DA MV model | AUROC 0.950, Sen 89%, Sp 90% | 12 + 8 | 13a |
(42) | Zhang 2013 | “EC” | 50 (25) | Nontargeted | NMR & UPLC -diode array | OPLS-DA MV model | Not reported | NMR > 25 UHPLC = 7 | NMR = 12 UHPLC = 7 |
(43) | Hasim 2012 | ESCC | 148 (108) | Nontargeted | 1 h-NMR | OPLS-DA MV model | Not reported | P values not reported | P values not reported |
(44) | Liu 2013 | ESCC | 152 (72) | Nontargeted | UPLC-ESI-TOFMS | 15 metabolites | Not reported | 15 | 11 |
(45) | Xu 2013 | ESCC | 228 (124) | Nontargeted | RRLC/ESI-MS | LRM (7 metabolites) | AUROC 0.961, Sen 90%, Sp 96% | 18 | 3a |
(46) | Jin 2014 | ESCC | 110 (80) | Nontargeted | GC-MS | OPLS-DA MV model | Not reported | 43 | 39a |
(47) | Ma 2014 | ESCC | 111 (51) | Targeted | HPLC | PLS-DA MV model | Not reported | 12 | 11a |
Studies investigating gastric malignancies | |||||||||
(48) | Yu 2011 | GAC | 28 (9) | Nontargeted | GC-MS | PLS-DA MV model | Not reported | 12 | 3a |
(49) | Aa 2012 | GAC | 37 (17) | Nontargeted | GC-MS | OPLS-DA MV model | Not reported | 15 | 0a |
(50) | Song 2012 | GAC | 60 (30) | Nontargeted | GC-MS | OPLS-DA MV model | Not reported | 18 | 6a |
Studies investigating colorectal malignancies | |||||||||
(51) | Zhao 2007 | CRC | 258 (133) | Nontargeted | LC-MS | LRM (4 metabolites) | Sen 82%, Sp 93% | 10 | 8a |
(52) | Qiu 2009 | CRC | 129 (64) | Nontargeted | GC-MS; UPLC-MS | OPLS-DA MV model | Not reported | 38 | 16a |
(53) | Ludwig 2009 | CRC | 57 (38) | Nontargeted | 1 H-NMR | PCA MV Model | Sen “about 70%” | P values not reported | P values not reported |
(29) | Ritchie 2010 | CRC | 443 (223) | Nontargeted | FTICR-MS, NMR, UPLC MS/MS | 3 metabolites | AUROC for each 0.97 | 50 | 50a |
(54) | Kondo 2011 | CRC | 50 (42) | Nontargeted | GC-MS | PLS-DA MV model | Not reported | 9 | 0a |
(55) | Leichtle 2011 | CRC | 117 (59) | Targeted | ESI-MS/MS | LRM (2 metabolites and CEA) | AUROC 0.88 | 11 | 11 |
(56) | Ma 2012 | CRC | 38 (30) | Nontargeted | GC-MS | Hierarchical clustering (6 metabolites) | Sen 94% | 6 | 2a |
(57) | Nishiumi 2012 | CRC | 242 (119) | Nontargeted | GC-MS | LRM (4 metabolites) | AUROC 0.91, Sen 85%, Sp 85% | 76 | 46a |
(58) | Li 2013 | CRC | 104 (52) | Nontargeted | DI±ESI FTICR-MS | LRM (10 metabolites) | AUROC 0.99, Sen 98%, Sp 100% | P values not reported | P values not reported |
(59) | Li 2013 | CRC | 360 (120) | Nontargeted | LC-MS/MS | LRM (4 metabolites) | Sen 89% Sp 80% | 16 | 16 |
(30) | Ritchie 2013 | CRC | 5,883 | Targeted | MS/MS TQMRM | 1 metabolite | Sen 85.7%, Sp 53%b | 1 | n/a |
(60) | Tan 2013 | CRC | 204 (102) | Nontargeted | GC-TOFMS & UPLC-QTOFMS | OPLS-DA MV model | Sen 100%, Sp 100% | 72 | 62a |
(61) | Wang 2014 | CRC | 36 (16) | Nontargeted | SPME-GC/MS | PLS-DA MV model | Not reported | 4 | 4 |
(62) | Zamani 2014 | CRC | 66 (33) | Nontargeted | 1 H-NMR | OPLS-DA MV model | Not reported | P values not reported | P values not reported |
(63) | Zhu 2014 | CRC | 234(66) | Targeted | LC-MS/MS | PLS-DA MV model | AUROC 0.93, Sen 96%, Sp 80% | 42 | 7a |
Studies investigating multiple malignancies | |||||||||
(64) | Miyagi 2011 | GAC | 1,383 (199) | Targeted | HPLC-ESI-MS | LDA | AUROC 0.82–0.85 | 6 | 6 |
CRC | 1,383 (199) | LDA | AUROC 0.87–0.88 | 10 | 10 | ||||
(65) | Ikeda 2012 | ESCC | 50 (15) | Nontargeted | GC-MS | 2 metabolites | Sen 80%–81%, Sp 59%–90% | 9 | 1a |
GAC | 50 (11) | 2 metabolites | Sen 70%–84%, Sp 71%–90%; | 5 | 0a | ||||
CRC | 50 (12) | 3 metabolites | Sen 54.5%–81.8%, Sp 66.7%–91.6% | 12 | 0a |
NOTE: Where studies included discovery and validation cohorts, diagnostic metrics of the validation set included for analysis.
Abbreviations: EAC, esophageal adenocarcinoma; ESCC, esophageal squamo-cellular carcinoma; GAC, gastric adenocarcinoma; CRC, colorectal adenocarcinoma; UPLC-TQMS, ultra-performance liquid chromatography-triple quadrupole mass spectrometry; NMR, nuclear magnetic resonance spectroscopy; ESI-TOFMS, electrospray ionization time-of-flight mass spectrometry; RRLC, rapid relaxing liquid chromatography; GC-MS, gas chromatography mass spectrometry; HPLC, high-performance liquid chromatography; FTICR-MS, Fourier transform ion cyclotron mass spectrometry; MS/MS, tandem mass spectrometry; TQMRM, triple quadrupole multiple reaction monitoring; DI, direct ionization; SPME, solid phase microextraction; PLS-DA, partial least squares discriminant analysis; ROC, receiver operating characteristic curve; PCA, principle component analysis; OPLS-DA, orthogonal projection to latent structures discriminant analysis; LRM, logistic regression model; LDA, linear discriminant analysis; MC, multiplicity correction; MV, multivariable; AUROC, area under receiver operating characteristic curve; Sen, sensitivity; Sp, specificity.
aWe applied a Bonferroni correction (α/n compared features).
bSpecificity not stated but calculated from available data.
Item . | Did the study: . | All studies (n = 30) . | Lower reporting studies (n = 15) . | Higher reporting studies (n = 14) . | Pa = . |
---|---|---|---|---|---|
1 | Identify the article as a study of diagnostic accuracy? | 31% | 7% | 57% | 0.005** |
2 | State the research aims? | 38% | 20% | 57% | 0.046* |
3 | Describe population selection criteria? | 83% | 73% | 93% | 0.186 |
4 | Describe patient selection criteria? | 83% | 80% | 86% | 0.535 |
5 | State pattern of recruitment (e.g., consecutive)? | 14% | 0% | 29% | 0.042* |
6 | State if data collected before or after testing? | 55% | 33% | 79% | 0.018* |
7 | State rationale of the reference standard? | 62% | 47% | 79% | 0.082 |
8 | Describe the technical details of the index test? | 97% | 93% | 100% | 0.517 |
9 | Define units and/or cut-offs for the index test and reference standard? | 62% | 33% | 93% | 0.001** |
10 | Describe who carried out testing? | 34% | 20% | 50% | 0.095 |
11 | Describe whether those carrying out testing were blinded? | 93% | 93% | 93% | 0.741 |
12 | Describe methods of determining test accuracy and statistical uncertainty? | 55% | 33% | 79% | 0.018* |
13 | Describe methods for calculating test reproducibility, if done? | 45% | 0% | 93% | <0.001*** |
14 | State when the study was performed? | 38% | 33% | 43% | 0.442 |
15 | Describe basic demographics of participants? | 55% | 33% | 79% | 0.018* |
16 | State the number of eligible patients not included, and the reasons why? | 7% | 7% | 7% | 0.741 |
17 | Give the time interval between the index test and reference standard? | 14% | 7% | 21% | 0.272 |
18 | Describe the severity of the disease in participants with defined criteria? | 59% | 40% | 79% | 0.041* |
19 | Provide cross-tabulation, etc. of patient assignment by tests? | 21% | 7% | 36% | 0.070 |
20 | Describe adverse events during testing? | 0% | 0% | 0% | 1.000 |
21 | Provide estimates of diagnostic accuracy and statistical uncertainty? | 34% | 7% | 64% | 0.002** |
22 | Describe how unclassified patients and outliers were analyzed? | 10% | 7% | 14% | 0.473 |
23 | Describe estimates of accuracy between subgroups or centers, if done? | 21% | 7% | 83% | 0.067 |
24 | Provide estimates of test reproducibility, if done? | 10% | 0% | 50% | 0.121 |
25 | Discuss the clinical applicability of the study findings? | 90% | 87% | 93% | 0.527 |
Item . | Did the study: . | All studies (n = 30) . | Lower reporting studies (n = 15) . | Higher reporting studies (n = 14) . | Pa = . |
---|---|---|---|---|---|
1 | Identify the article as a study of diagnostic accuracy? | 31% | 7% | 57% | 0.005** |
2 | State the research aims? | 38% | 20% | 57% | 0.046* |
3 | Describe population selection criteria? | 83% | 73% | 93% | 0.186 |
4 | Describe patient selection criteria? | 83% | 80% | 86% | 0.535 |
5 | State pattern of recruitment (e.g., consecutive)? | 14% | 0% | 29% | 0.042* |
6 | State if data collected before or after testing? | 55% | 33% | 79% | 0.018* |
7 | State rationale of the reference standard? | 62% | 47% | 79% | 0.082 |
8 | Describe the technical details of the index test? | 97% | 93% | 100% | 0.517 |
9 | Define units and/or cut-offs for the index test and reference standard? | 62% | 33% | 93% | 0.001** |
10 | Describe who carried out testing? | 34% | 20% | 50% | 0.095 |
11 | Describe whether those carrying out testing were blinded? | 93% | 93% | 93% | 0.741 |
12 | Describe methods of determining test accuracy and statistical uncertainty? | 55% | 33% | 79% | 0.018* |
13 | Describe methods for calculating test reproducibility, if done? | 45% | 0% | 93% | <0.001*** |
14 | State when the study was performed? | 38% | 33% | 43% | 0.442 |
15 | Describe basic demographics of participants? | 55% | 33% | 79% | 0.018* |
16 | State the number of eligible patients not included, and the reasons why? | 7% | 7% | 7% | 0.741 |
17 | Give the time interval between the index test and reference standard? | 14% | 7% | 21% | 0.272 |
18 | Describe the severity of the disease in participants with defined criteria? | 59% | 40% | 79% | 0.041* |
19 | Provide cross-tabulation, etc. of patient assignment by tests? | 21% | 7% | 36% | 0.070 |
20 | Describe adverse events during testing? | 0% | 0% | 0% | 1.000 |
21 | Provide estimates of diagnostic accuracy and statistical uncertainty? | 34% | 7% | 64% | 0.002** |
22 | Describe how unclassified patients and outliers were analyzed? | 10% | 7% | 14% | 0.473 |
23 | Describe estimates of accuracy between subgroups or centers, if done? | 21% | 7% | 83% | 0.067 |
24 | Provide estimates of test reproducibility, if done? | 10% | 0% | 50% | 0.121 |
25 | Discuss the clinical applicability of the study findings? | 90% | 87% | 93% | 0.527 |
aFisher's exact test.
*, P < 0.05; **, P < 0.01; ***, P < 0.001. Abridged questions given.
The majority (25/29) were phase I biomarker discovery studies investigating a single cancer. Nine studies investigated esophageal cancer (39–47), 3 gastric cancer (48–50), and 15 colorectal cancer (29, 30, 51–63). Two studies investigated more than one GI malignancy (64, 65). One group studying colorectal cancer built on earlier discovery findings in a large, multicenter, phase IV prospective cohort study (29, 30), and subsequently marketed a blood test that evaluates risk for colorectal cancer. A second group reported an external validation of earlier promising biomarker findings in an internationally distinct cohort (51, 59).
Reported methods were highly heterogeneous (see Table 1). With regards to sample procurement, there was variation in initial venepuncture receiver (7 types), blood fraction (plasma, serum, whole blood, and blood-headspace), sample standing time, centrifuge parameters, methods to check for hemolysis, and storage parameters. All studies reported using a standardized sample collection procedure. Sample preparation for chemical analysis was similarly diverse, with varied choices in deproteination, extraction, use of standards, and derivatization of analytes. Moreover, numerous instruments were employed including NMR (13%), gas chromatography–MS (30%), liquid LC-MS (33%), a combination of approaches (13%), and a minority of singly investigated techniques.
Methodologic and reporting quality
The risk of methodologic bias assessed with QUADAS-2 was often high or unclear (see Supplementary Fig. S1 and Supplementary Tables S1 and S2). Our assessment of the quality of the metabolomic approach found that 52% omitted appropriate measures to ratify compound identification or quantitation, and/or relevant quality control procedures. Only 34% of studies endoscopically assessed all participants including negative controls. In one fifth of studies, a reference standard was not clearly defined (e.g., “patients with cancer” instead of “endoscopically/histologically/radiologically confirmed cancer”). Other methodologic concerns included nonconsecutive patient selection, case–control design, and declared financial interest.
Reporting quality of these biomarker studies was medium-to-poor (median score 10/25, see Table 2 and Supplementary Table S3). Although four studies reported more than 70% of STARD items, no study achieved all items. There was limited reporting of details of patient recruitment, the reasons for exclusion of eligible patients, and how model outliers were interpreted. No study reported on adverse events encountered during testing (including endoscopic complications). Studies with more rigorous reporting standards were more likely to discuss the timing of recruitment, provide comprehensive demographics data, define cut-offs, and estimate test reproducibility and statistical uncertainty (all P < 0.05, see Table 2).
Study findings
All studies described the diagnostic potential of their methods, but only 56% reported sensitivity, specificity, and/or AUROC metrics. The majority of these (14 of 17 studies) primarily derived diagnostic indices directly from a multivariate model (MVM), usually based on partial least squares regression, orthogonal projection to latent structures, or logistic regression.
In total, the studies reported at least 528 unique blood metabolites that significantly differed between patients with GI cancer and controls (Ritchie and colleagues only reported a subset of significant markers; ref. 29). After homogenizing for multiplicity error, there were 246 significantly different metabolites (see Table 3 and Supplementary Table S4). Disease-specific metabolite relationships were visualized using Metscape networks, and a cancer-specific list of affected EHMN subdivisions was constructed (see Table 3). Elements of up to 34 of approximately 70 EHMN subdivisions were modulated. An example subnetwork is given in Fig. 2, demonstrating the extensive pathway participation and reciprocation of amino acids modulated in colorectal cancer (with elements of six amino acid EHMN subdivisions).
Metabolic pathway . | EAC . | ESCC . | GC . | CRC . |
---|---|---|---|---|
Aminosugars metabolism | ✓ | |||
Arachidonic acid metabolism | ✓ | |||
Bile acid biosynthesis | ✓ | ✓ | ✓ | |
Biopterin metabolism | ✓ | ✓ | ||
Butanoate metabolism | ✓ | ✓ | ✓ | |
C21-steroid hormone biosynthesis | ✓ | ✓ | ||
De novo fatty acid biosynthesis | ✓ | ✓ | ✓ | |
Di-unsaturated fatty acid beta oxidation | ✓ | ✓ | ✓ | |
Endohydrolysis of 1,4-alpha-D-glucosidic polysaccharide linkages | ✓ | |||
Fructose and mannose metabolism | ✓ | ✓ | ||
Galactose metabolism | ✓ | ✓ | ✓ | |
Glycerophospholipid metabolism | ✓ | ✓ | ✓ | ✓ |
Glycine, serine, alanine, and threonine metabolism | ✓ | ✓ | ✓ | |
Glycolysis and gluconeogenesis | ✓ | ✓ | ||
Glycosphingolipid metabolism | ✓ | ✓ | ||
Histidine metabolism | ✓ | ✓ | ✓ | ✓ |
Leukotriene metabolism | ✓ | ✓ | ||
Linoleate metabolism | ✓ | ✓ | ✓ | |
Lipoate metabolism | ✓ | ✓ | ✓ | |
Lysine metabolism | ✓ | ✓ | ✓ | ✓ |
Methionine and cysteine metabolism | ✓ | ✓ | ✓ | |
Pentose phosphate pathway | ✓ | |||
Phosphatidylinositol phosphate metabolism | ✓ | ✓ | ||
Porphyrin metabolism | ✓ | ✓ | ||
Prostaglandin formation | ✓ | |||
Purine metabolism | ✓ | ✓ | ✓ | |
Pyrimidine metabolism | ✓ | ✓ | ✓ | |
Saturated fatty acids beta-oxidation | ✓ | ✓ | ||
Squalene and cholesterol biosynthesis | ✓ | ✓ | ||
TCA cycle | ✓ | ✓ | ✓ | |
Tryptophan metabolism | ✓ | ✓ | ✓ | ✓ |
Tyrosine metabolism | ✓ | ✓ | ✓ | |
Urea cycle and arginine, proline, glutamate, aspartate, and asparagine metabolism | ✓ | ✓ | ✓ | ✓ |
Valine, leucine, and isoleucine degradation | ✓ | ✓ | ✓ | ✓ |
Vitamin B3 metabolism | ✓ | |||
Vitamin B5 - CoA biosynthesis and pantothenate | ✓ | ✓ | ✓ | |
Vitamin B9 (folate) metabolism | ✓ | ✓ | ||
Vitamin E metabolism | ✓ | |||
Vitamin H (biotin) metabolism | ✓ | ✓ | ✓ | |
Total affected pathways | 23 | 28 | 13 | 34 |
Metabolic pathway . | EAC . | ESCC . | GC . | CRC . |
---|---|---|---|---|
Aminosugars metabolism | ✓ | |||
Arachidonic acid metabolism | ✓ | |||
Bile acid biosynthesis | ✓ | ✓ | ✓ | |
Biopterin metabolism | ✓ | ✓ | ||
Butanoate metabolism | ✓ | ✓ | ✓ | |
C21-steroid hormone biosynthesis | ✓ | ✓ | ||
De novo fatty acid biosynthesis | ✓ | ✓ | ✓ | |
Di-unsaturated fatty acid beta oxidation | ✓ | ✓ | ✓ | |
Endohydrolysis of 1,4-alpha-D-glucosidic polysaccharide linkages | ✓ | |||
Fructose and mannose metabolism | ✓ | ✓ | ||
Galactose metabolism | ✓ | ✓ | ✓ | |
Glycerophospholipid metabolism | ✓ | ✓ | ✓ | ✓ |
Glycine, serine, alanine, and threonine metabolism | ✓ | ✓ | ✓ | |
Glycolysis and gluconeogenesis | ✓ | ✓ | ||
Glycosphingolipid metabolism | ✓ | ✓ | ||
Histidine metabolism | ✓ | ✓ | ✓ | ✓ |
Leukotriene metabolism | ✓ | ✓ | ||
Linoleate metabolism | ✓ | ✓ | ✓ | |
Lipoate metabolism | ✓ | ✓ | ✓ | |
Lysine metabolism | ✓ | ✓ | ✓ | ✓ |
Methionine and cysteine metabolism | ✓ | ✓ | ✓ | |
Pentose phosphate pathway | ✓ | |||
Phosphatidylinositol phosphate metabolism | ✓ | ✓ | ||
Porphyrin metabolism | ✓ | ✓ | ||
Prostaglandin formation | ✓ | |||
Purine metabolism | ✓ | ✓ | ✓ | |
Pyrimidine metabolism | ✓ | ✓ | ✓ | |
Saturated fatty acids beta-oxidation | ✓ | ✓ | ||
Squalene and cholesterol biosynthesis | ✓ | ✓ | ||
TCA cycle | ✓ | ✓ | ✓ | |
Tryptophan metabolism | ✓ | ✓ | ✓ | ✓ |
Tyrosine metabolism | ✓ | ✓ | ✓ | |
Urea cycle and arginine, proline, glutamate, aspartate, and asparagine metabolism | ✓ | ✓ | ✓ | ✓ |
Valine, leucine, and isoleucine degradation | ✓ | ✓ | ✓ | ✓ |
Vitamin B3 metabolism | ✓ | |||
Vitamin B5 - CoA biosynthesis and pantothenate | ✓ | ✓ | ✓ | |
Vitamin B9 (folate) metabolism | ✓ | ✓ | ||
Vitamin E metabolism | ✓ | |||
Vitamin H (biotin) metabolism | ✓ | ✓ | ✓ | |
Total affected pathways | 23 | 28 | 13 | 34 |
Esophago-gastric cancer
Including multicancer studies, 10 studies investigated esophageal cancer: [three esophageal adenocarcinoma and seven esophageal squamo-cellular carcinoma (ESCC)]. In esophageal adenocarcinoma, Zhang and colleagues used a dual NMR and LC-MS approach to derive a MVM with good discrimination (AUROC 0.95; ref. 42). Several compounds were cross-validated on both platforms. All studies investigating ESCC were conducted in China and Japan. Xu and colleagues found that a logistic regression model based on seven metabolites was 90% sensitive and 96% specific for discriminating cancer from healthy controls in an independent validation set, after applying electrospray-MS to serum samples (45). Apart from one small series (65), this was the only study that applied diagnostic metrics to ESCC metabolomic data.
There were five metabolomic studies that investigated serum biomarkers solely in gastric cancer, and these were all of pilot character with less than 30 cancer patients included. The most promising findings were reported by Miyagi and colleagues, in which a decision model based on linear discriminant analysis of serum biomarkers from 199 patients had AUROCs of 0.874 to 0.881 (64).
Colorectal cancer
A total of 17 studies investigated colorectal cancer. In the United States, Zhao and colleagues were the first to apply a serum metabolomic approach to this disease in 2007 (51). They applied a targeted LC-MS analysis of lysophosphocholines (LPC) within a high-quality study design, and found significant and consistent reduction in many of these species in patients with colorectal cancer. Using four LPC species, a logistic regression model was built, which, in a separate validation cohort, demonstrated a sensitivity of 82% and specificity of 93% for detecting cancer. More recently, these findings were externally validated in China; Li and colleagues found a comparable reduction in the same biomarkers, with improved diagnostic sensitivity (89%), but reduced specificity (80%; ref. 59). In 2012, Leichtle and colleagues reported a targeted amino acid analysis, with fastidious standardization of “preanalytic” methods, and data handling (55). They found 12 amino acids to be significantly different between cancer and controls. However, AUROC statistics were unsatisfactory; glycine was the best discriminatory metabolite with an AUROC of 0.707. A combination of tyrosine, glycine, and chorioembryonic antigen (CEA) provided an improved AUROC when compared with CEA alone (AUROC 0.878 vs. 0.794). Nishiumi and colleagues also developed a risk prediction model based on four metabolites with an AUROC, sensitivity and specificity of at least 85%; however, they also commented on poor univariate performance (57). Moreover, the authors found significant inter- and intraday variability in metabolites of interest, and recommended caution with interpreting their findings.
Ritchie and colleagues undertook a robust biomarker discovery approach that initially utilized Fourier transform-ion cyclotron resonance MS to generate detailed metabolic signatures of serum from one Canadian and two Japanese populations (29). Instead of electron multiplier, count-based mass detection, this preconcentrating, ion-resonance technology offers ultra-high resolution over a wide mass range (66). After compound elucidation with NMR and MS/MS technologies, the authors were able to identify a large group of ultra-long, hydroxylated, polyunsaturated fatty acids (carbon length 26–38) that were persistently reduced in all stages of colorectal cancer. They developed a highly specific MS method for quantifying the most promising chemical species, termed GTA-440, and proceeded to a multicenter prospective cohort study (30). This single metabolite was 85.7% sensitive to predict colorectal cancer risk across all stages. The authors did not publish a specificity metric, although this is estimated to be 52.1% to 80.1% (aged-matched cohorts drawn from a “referred for colonoscopy” or general population, respectively).
In 2013, Tan and colleagues utilized both GC-MS and LC-TOF-MS platforms to assure wide coverage of serum metabolome in a targeted analysis of 158 metabolites, all verified with analytical standards (60). They derived a MVM that correctly classified all 80 participants in an independent validation set. After correction for multiplicity, 62 metabolites were significantly different between cancer and controls. Most recently, Zhu and colleagues improved a MVM of metabolite differences by adjusting with four clinical factors (63). The improved model was 96% sensitive and 80% specific for cancer discrimination.
Discussion
The aim of this systematic review was to evaluate blood metabolomics for endoluminal cancer. Across a large pooled cohort, at least 246 significant metabolic differences were reported after strict multiplicity correction, with only one being approved for clinical use to date (30). This test (Cologic, Phenomenome Discoveries, Saskatoon, Canada) was introduced in 2013, and is available for colorectal cancer screening by mail in Canada and Japan. The test is offered as an alternative to the fecal occult blood test, with better sensitivity and similar analysis times. There is currently no metabolomic blood test for upper GI endoluminal cancer.
Despite wide variety in sampling and analytics, almost all studies found significant differences between cancer and control groups, reaffirming the profound disturbance these cancers impart on the systemic metabolome. On the available evidence, elements of up to half of the EHMN subdivisions were affected. In colorectal cancer, established oncophenotypes were consistently reported (Table 3), including evidence of glycolytic switch, de novo lipogenesis, and amino acid mobilization (67–69). Upper GI malignancy displayed similarly broad metabolite differences despite being less extensively investigated.
A perceived weakness of systemic metabolomic biomarkers is that noncancer factors (e.g., nutritional state or the microbiome) have too great an influence to support a specific test. In this review, there was evidence that metabolite modulation was discernable at very early stages of GI cancer (30, 46, 52, 57–59). Several studies reported incremental metabolomic changes with invasive (41, 43, 58, 59) and metastatic (46, 70, 71) progression. In addition, three studies in colorectal cancer that were excluded from the appraisal process owing to their design nonetheless demonstrated pertinent findings to metabolomic dynamics. This included (i) no evidence of metabolic markers that predicted future risk of developing endoluminal malignancy (31, 32), and (ii) a normalization effect of the serum metabolome after successful colorectal cancer resection (33). These trends indicate that cell autonomous factors are likely to exert the dominant stimulus over systemic metabolic homeostasis, which is consistent with the constant and progressive influence associated with malignancy. Moreover, ultra-long polyunsaturated fatty acids, phosphocholines and amino acids were found to be similarly perturbed in both western and Asian colorectal cancer populations (only these three groups were studied in both cohorts). This geographic independence further suggests a fundamental function for these phenotypes in cancer biology.
The most frequently reported discriminating metabolites were amino acids (e.g., serum histidine, downregulated in five colorectal studies). These compounds are easily identified on both NMR and MS platforms, and several studies employed a targeted amino acid methodology (47, 55, 63, 64). However, amino acids were not specific as clinical cancer biomarkers. Certain fatty acids (30) and phospholipids (51) outperformed amino acids, presumably as they represent singular endpoints of specific oncotypic processes, rather than central elements of multilateral, highly interconnected networks that are variably perturbed (as illustrated in Fig. 2). Moreover, specific lipids may be underrepresented in this review as these are not easily resolved by NMR and represent challenging targets for MS analysis. The debate concerning the utility of sarcosine in prostate cancer further illustrates the problem of proposing central metabolic elements as specific biomarkers (72–74). Further, trace metabolic features will be discovered as the affordability and performance of high resolution analytics improves, whose clinical value may be less impaired by diverse pathway participation.
The findings of this review are subject to several limitations. The chief limitation is illustrated by study quality assessments, in that a significant proportion of studies were exploratory and offered results with low interpretable value. There is a particular paucity of evidence in gastric cancer. Second, all studies were conducted in a secondary care setting. The clinical applicability of these tests should be considered in study design; these technologies will not supersede endoscopy, as it is currently required to (i) provide tissue, (ii) plan surgery, (iii) distinguish synchronous pathology, and (iv) provide endotherapy (75, 76). Such nonendoscopic triage tools should therefore be deployed to minimize diagnostic uncertainty and improve service efficiency, and will be maximally influential in primary care (7, 77). By extension, augmenting metabolomic biomarker performance with clinical features and standard serologic markers was an uncommon but seemingly successful strategy (55, 63). Third, given that all studies reported positive findings also suggests a potential for publication bias. Finally, the highly heterogeneous methods necessitate caution with any attempt of synthesis, including the qualitative networking undertaken herein: this underlines the critical importance of adherence to standardization recommendations (22–26).
Conclusions
This review indicates that the analytical evolution of serum metabolomics may benefit GI cancer diagnostics. The best data in this study utilized consensus protocols to standardize sample procurement and processing, used high-resolution analytics, and clearly reported analyses of clinical value. Further discovery programs should be supported with refinement strategies for potential metabolic confounders (e.g., anemia, liver disease, and xenobiotics), and aim to test emerging diagnostic tools against similar primary care alternatives. Finally, validated biomarkers require adaptation from spectrometer-based protocols to point-of-care platforms, thus exploiting a major advantage of metabolomics over other molecular approaches.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant Support
G.B. Hanna received an award from the National Institute of Health Research (NIHR) to support this work (NIHR-Diagnostic Evidence Co-operative at Imperial College Healthcare NHS Trust). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.