Abstract
This review aims to develop an appropriate review tool for systematically collating metabolites that are dysregulated in disease and applies the method to identify novel diagnostic biomarkers for hepatocellular carcinoma (HCC). Studies that analyzed metabolites in blood or urine samples where HCC was compared with comparison groups (healthy, precirrhotic liver disease, cirrhosis) were eligible. Tumor tissue was included to help differentiate primary and secondary biomarkers. Searches were conducted on Medline and EMBASE. A bespoke “risk of bias” tool for metabolomic studies was developed adjusting for analytic quality. Discriminant metabolites for each sample type were ranked using a weighted score accounting for the direction and extent of change and the risk of bias of the reporting publication. A total of 84 eligible studies were included in the review (54 blood, 9 urine, and 15 tissue), with six studying multiple sample types. High-ranking metabolites, based on their weighted score, comprised energy metabolites, bile acids, acylcarnitines, and lysophosphocholines. This new review tool addresses an unmet need for incorporating quality of study design and analysis to overcome the gaps in standardization of reporting of metabolomic data. Validation studies, standardized study designs, and publications meeting minimal reporting standards are crucial for advancing the field beyond exploratory studies.
Introduction
Metabolomic techniques utilize advanced analytic chemical technology, such as nuclear magnetic resonance spectroscopy and gas or LC/MS, to detect and quantify compounds in complex chemical mixtures simultaneously. Its application to the analysis of clinical samples, such as plasma or urine, holds promise to provide novel solutions to disease diagnosis and therapeutic management of those conditions (1, 2). However, there is a tendency in the field of metabolic profiling to identify potential biomarkers without further validation of those biomarkers and thus the potential of the technology is rarely harnessed to provide a genuine clinical translation. Moreover, the reported metabolomic studies vary greatly in quality with aspects such as sample number, reporting of analytic methodology and statistical modeling of the data not always considered or reported. Therefore, if the technology is to be clinically useful, it is essential to have a method that sifts through the studies and weights them according to some robust criteria relating to study quality. This review aims to develop an appropriate review tool for systematically collating metabolites in accessible biofluids such as blood or urine that have been found to be statistically discriminatory between HCC and comparison groups with the purpose of identifying any probable diagnostic biomarker with the best potential to move forward. Articles identifying candidate biomarkers from tumor (T) tissue are also included with a view to helping ascertain whether the circulating or excreted “biomarkers” are directly related to tissue reprogramming. The huge volume of publications on metabolic profiling of HCC has not yet been systematically reviewed to gather the evidence and identify contradictory reports, with published reviews either suggesting the diagnostic promise of metabonomic technology with examples (3, 4), targeted metabolite panels (5) or otherwise focusing on specific analytical platforms or biofluids (6).
In 2018 worldwide, liver cancer was predicted to be the sixth most common cancer (fifth in men) and fourth leading cause of cancer death (second in men), with incidence and mortality rates two to three times higher in men than in women (7). Liver cancer is estimated to make up 4.7% of all new cancer cases, yet causes 8.2% of all cancer-related deaths worldwide (7). There are over 800,000 new cases of liver cancer annually and 13 countries where liver cancer is the leading cause of cancer death (7). Up to 85% of primary liver cancers are hepatocellular carcinomas (HCC). HCC rates are typically higher in less economically developed countries, with the highest prevalence in Northern and Western Africa and Eastern Asia (8, 9). Chronic hepatitis B and C infection are among key risk factors for HCC (10). Although the median age at diagnosis varies according to population, most HCCs develop in adults of working age (18–65 years), leading to significant burden on health care resources and productivity (9, 10).
A low-cost, effective HCC diagnostic test is needed for earlier detection to allow a higher chance of successful treatment and increased survival. This would reduce the health care resource burden and offer more years of economic productivity. The currently available noninvasive HCC marker, alpha-fetoprotein (AFP), is considered unreliable by current American (11) and European (12) guidelines owing to poor sensitivity and specificity around only 70%. AFP is therefore inappropriate for reliable HCC diagnosis. A background of cirrhosis accounts for 80% of HCCs (13) and HCC is a major cause of death in patients with cirrhosis (14); therefore 6-monthly ultrasound (US) scans are recommended for surveillance and diagnosis. However, US has its own limitations because it is operator dependent and so sensitivity can vary (14). Despite their poor diagnostic performance, AFP and US are heavily relied upon globally in resource-limited areas, including Africa (9). HCC is typically asymptomatic in early stages and therefore presents late with poor prognosis. This results in the situation where only 10%–20% patients with HCC can be diagnosed in the very early stage (6). Consequently, a more effective and low-cost diagnostic marker is needed to decrease the health care burden and overall survival of HCC (15), along with novel imaging technologies (16).
The search for novel diagnostic biomarkers in HCCs using metabonomic methods has been motivated by the understanding that substantial metabolic reprogramming is necessary for malignant cells to drive and sustain tumorigenesis (17). In patients with HCC, quantities of different metabolites are likely deranged because of changes in central energy metabolism (cancer cell preference for anaerobic respiration, even when oxygen is available: the Warburg hypothesis) and lipid profile to support rapid cell membrane turnover (18, 19). To date, numerous studies have been published reporting discriminatory metabolites between HCC and comparison groups aiming to identify potential diagnostic biomarkers and has resulted in the identification of hundreds of metabolites in various biofluids that that differentiate individuals with HCC from either control or participants with cirrhosis but not HCC.
This review should pave the way for the future research effort by enabling clarification of the most appropriate direction to focus on for advancing the development of HCC diagnostics using metabolic profiling. A new method for identifying candidate biomarkers that takes into account the quality of the primary studies will facilitate the development of an effective low-cost biomarker test for HCC has the potential to have global medical impact.
Materials and Methods
Search strategy
Eligible studies were human case–control studies that satisfied the following criteria: (i) the study compared patients diagnosed with HCC with one or more of the following comparison groups: healthy controls, any precirrhotic liver disease [LD; e.g., chronic hepatitis, nonalcoholic fatty LD (NAFLD), etc.], cirrhosis and nontumor (NT) tissue (for tissue studies); (ii) investigated liver tissue, blood (serum or plasma) and/or urine samples, and; (iii) analysis of the samples and reporting of the metabolites found to be statistically significantly different between HCC and the comparison groups. Exclusion criteria were studies that investigated compounds of specific dietary components (such as from tea), xenobiotics (such as aflatoxin exposure), hormones (such as androgen levels), and reactive oxygen species. Studies were limited to original research articles published in English.
Literature searches were conducted on the databases MEDLINE and Embase via Ovid. The search strategies, which included both MeSH terms and keywords, were developed with a librarian and were validated manually (Supplementary Table S1). Studies published up to February 5, 2019 were retrieved using the search strategies with additional papers identified manually through searches on PubMed and Google Scholar. This systematic review was performed in accordance to the PRISMA-DTA (diagnostic test accuracy) statement (ref. 20; Supplementary Table S2) and has been registered on PROSPERO: CRD42018095412 (21).
Study selection and data extraction
Initial screening using titles and abstracts, and subsequent full-text screening were performed by two investigators independently at each stage (M.R.A. U and A. Alkhatib, and M.R.A. U and E.Y.-L. Shen, respectively). Included studies were then divided into four groups and were extracted by investigators (A. Alkhatib, AU, C. Cartlidge, E.Y.-L. Shen) and reviewed independently. Any discrepancy or disagreement was resolved through discussion. For each article, two sets of data were extracted: study characteristics and the discriminatory metabolites reported. Data extracted for study characteristics were the number of participants in each study group recruited (main and validation cohort separately), the country where the study was conducted, the underlying etiology of HCC patient LD, the condition of sample collection, whether the comparison groups were matched according to sex and age, the analytic method used, and the source of funding. For the reported discriminant metabolites, only those found to be statistically significant, as defined by the study, were extracted. Data items include compound name, comparison group compared, statistical test used for determining significance, direction of change (increased or decreased in HCC), fold change (where available), and P value (or equivalent). To unify synonyms of the names of the compounds reported, the reported names were matched with those in the Human Metabolome Database (ref. 22; for nonlipids) or LIPIDMAPS (ref. 23; for lipids). Those not listed on either of these databases or those that were not individual chemical entities (e.g., groups of compounds or ratios) were unified manually.
Risk of bias assessment
A bespoke tool to assess the risk of bias (RoB) of the studies reviewed was developed on the basis of established tools (24, 25) and minimal reporting standards for metabonomics studies (refs. 26, 27; Table 1). Each publication was assessed according to four domains: (i) study design, (ii) chemical analysis, (iii) data analysis, and (iv) the reporting of discriminant metabolites, with each domain containing three to six items. For each item, criteria for high, medium, and/or low risk were defined and were assigned −1, 0, and +1 point, respectively. The total score from the RoB assessment was the sum of all four domains. The RoB assessment was developed, trialed, and modified on the basis of feedback from investigators. Once finalized, it was implemented in the same manner as the data extraction process.
A bespoke RoB assessment for metabonomic studies.
No. . | Item . | Category . | Risk of bias . |
---|---|---|---|
Domain 1 – Design of experiment | |||
1.1 | Number of HCC cases | n ≥ 50 | Low |
10 ≤ n < 50 | Medium | ||
n < 10 | High | ||
1.2 | Are participant characteristics reported by study group (HCC, cirrhosis, etc.) for each cohort (discovery and validation) | Yes | Low |
No | High | ||
1.3 | Are the diagnostic criteria for HCC and other liver conditions (where applicable) stated? | Stated – HCC confirmed with 2 modalities of imaging (Any 2 of MRI, CT, CEUS) or histologically proven | Low |
Stated – other methods | Medium | ||
Not stated | High | ||
1.4 | Are inclusion and exclusion criteria stated? | Stated | Low |
Not stated | Medium | ||
1.5 | Were potential confounders (e.g., sex, age) discussed and taken into account in data analysis? (Note 1) | Discussed and taken into account | Low |
Discussed | Medium | ||
Not discussed | Medium | ||
1.6 | Was there validation using an independent cohort? | Yes | Low |
No | Medium | ||
Domain 2 – Chemical analysis | |||
2.1 | Are the conditions of sample collection, storage and transportation stated? | Stated, no concern | Low |
Stated, concern present | Medium | ||
Not stated | High | ||
2.2 | Were samples randomised prior to analysis? | Yes | Low |
Not stated | Medium | ||
2.3 | Is preanalysis sample processing stated? | Stated, no concern | Low |
Stated, minor concern present | Medium | ||
Stated, major concern present | High | ||
Not stated | High | ||
2.4 | Was a pooled quality control sample used and its reproducibility shown? | Yes – shown | Low |
Yes – not shown | Medium | ||
Not mentioned | High | ||
Not applicable | NA | ||
2.5 | Does the reporting of chemical analysis meet minimum reporting standards (Note 2) | No concern | Low |
Minor concern present | Medium | ||
Major concern present | High | ||
Domain 3 – Data analysis | |||
3.1 | Is the data analysis workflow clearly described? | Yes | Low |
Unclear, minor | Medium | ||
Unclear, major | High | ||
3.2 | Was an established software/algorithm used and if not is the code provided and validated for the new analysis? | Yes, established software used or sufficient information on analysis pipeline provided | Low |
No, new software used with incomplete information provided on analysis pipeline | Medium | ||
No, new software used but code either not provided or no information on analysis pipeline | High | ||
3.3 | Were the levels of significance corrected for multiple testing? | Yes | Low |
No | High | ||
Domain 4 – Reporting of discriminant metabolites | |||
4.1 | Level of confidence in metabolite identification (Note 3) | 1 – identified | Low |
2 – putatively annotated | Medium | ||
3 – putatively characterised compound class | High | ||
4 – unknown | High | ||
4.2 | Is the variability (e.g., interquartile range) of metabolite level reported? | Yes | Low |
No | Medium | ||
4.3 | Is the precise P value (or equivalent) reported? | Precise value stated | Low |
Only range reported | Medium | ||
Direction of change only | High |
No. . | Item . | Category . | Risk of bias . |
---|---|---|---|
Domain 1 – Design of experiment | |||
1.1 | Number of HCC cases | n ≥ 50 | Low |
10 ≤ n < 50 | Medium | ||
n < 10 | High | ||
1.2 | Are participant characteristics reported by study group (HCC, cirrhosis, etc.) for each cohort (discovery and validation) | Yes | Low |
No | High | ||
1.3 | Are the diagnostic criteria for HCC and other liver conditions (where applicable) stated? | Stated – HCC confirmed with 2 modalities of imaging (Any 2 of MRI, CT, CEUS) or histologically proven | Low |
Stated – other methods | Medium | ||
Not stated | High | ||
1.4 | Are inclusion and exclusion criteria stated? | Stated | Low |
Not stated | Medium | ||
1.5 | Were potential confounders (e.g., sex, age) discussed and taken into account in data analysis? (Note 1) | Discussed and taken into account | Low |
Discussed | Medium | ||
Not discussed | Medium | ||
1.6 | Was there validation using an independent cohort? | Yes | Low |
No | Medium | ||
Domain 2 – Chemical analysis | |||
2.1 | Are the conditions of sample collection, storage and transportation stated? | Stated, no concern | Low |
Stated, concern present | Medium | ||
Not stated | High | ||
2.2 | Were samples randomised prior to analysis? | Yes | Low |
Not stated | Medium | ||
2.3 | Is preanalysis sample processing stated? | Stated, no concern | Low |
Stated, minor concern present | Medium | ||
Stated, major concern present | High | ||
Not stated | High | ||
2.4 | Was a pooled quality control sample used and its reproducibility shown? | Yes – shown | Low |
Yes – not shown | Medium | ||
Not mentioned | High | ||
Not applicable | NA | ||
2.5 | Does the reporting of chemical analysis meet minimum reporting standards (Note 2) | No concern | Low |
Minor concern present | Medium | ||
Major concern present | High | ||
Domain 3 – Data analysis | |||
3.1 | Is the data analysis workflow clearly described? | Yes | Low |
Unclear, minor | Medium | ||
Unclear, major | High | ||
3.2 | Was an established software/algorithm used and if not is the code provided and validated for the new analysis? | Yes, established software used or sufficient information on analysis pipeline provided | Low |
No, new software used with incomplete information provided on analysis pipeline | Medium | ||
No, new software used but code either not provided or no information on analysis pipeline | High | ||
3.3 | Were the levels of significance corrected for multiple testing? | Yes | Low |
No | High | ||
Domain 4 – Reporting of discriminant metabolites | |||
4.1 | Level of confidence in metabolite identification (Note 3) | 1 – identified | Low |
2 – putatively annotated | Medium | ||
3 – putatively characterised compound class | High | ||
4 – unknown | High | ||
4.2 | Is the variability (e.g., interquartile range) of metabolite level reported? | Yes | Low |
No | Medium | ||
4.3 | Is the precise P value (or equivalent) reported? | Precise value stated | Low |
Only range reported | Medium | ||
Direction of change only | High |
Note: 1. Methods for taking confounders into account include controlling in adjusted models or by stratified analysis. Discussed and not discussed carry equal weighting because although discussing the fact that confounder adjustment did not occur and it does not alter statistical confidence in the interpretation.
2. Basic information of experimental setup and condition required for the reporting of each analytic technology, see Sumner, et al., 2007 (27) and additional notes in Goodacre, et al., 2007 (26).
3. Based on the recommendations of the Chemical Analysis Working Group of the Metabolomics Standards Initiative regarding proposed minimum metadata relative to metabolite identification, see Sumner, et al., 2007 (27).
Abbreviations: CEUS, contrast-enhanced ultrasound; CT, computed tomography; HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging.
Synthesis of results
We aimed to produce a list of metabolites that were found to be upregulated or downregulated in a consistent manner for each sample type by incorporating three factors. First, the number of times each metabolite was reported to be significantly changed in one direction, as opposed to the other by taking the sum of vote counts (+1 for upregulation in HCC compared with the comparison group, −1 for downregulation). Alternatively, to incorporate the extent of change, log base 2 of fold change (log2FC) values were used in place of vote count, where fold change values were reported. For entries without fold change values reported, they were approximated by an estimate of the median of reported fold change value for each direction of change in each comparison. Second, each report's contribution for a metabolite was scaled by the RoB score. The RoB results from Domains 1–3 for the reporting publication were added to the Domain 4 metabolite-specific results, giving a total four-domain RoB score of a maximum of 17 points (+1 for low, 0 for medium, and −1 for high RoB for each of the 17 RoB assessment items). Metabolite entries with a final RoB score of zero or below (n = 48) were removed from the final analysis. Third, a discordance penalty was applied to penalize metabolites that had contradictory reports: the number of reports of change in one direction versus reports of change in the other direction.
To synthesize the extracted list of discriminatory metabolites across publications reviewed, a weighted score system was developed to incorporate four factors: vote count (an upvote, given the value of +1 if a metabolite was reported to be significantly higher in HCC than in the comparison group compared, and −1 for a downvote), the extent of change represented by log2FC, RoB of the publication reporting the finding and an overall discordance penalty. For each metabolite in each comparison (HCC vs. healthy, HCC vs. noncirrhotic LD, HCC vs. cirrhosis or HCC tumor vs. matched NT liver tissue) in each sample type,
where, n was the number of publications reporting significant differences between HCC and controls, p was a penalty for discordant report, calculated by |sum(upvote, downvote)|/n. For example, a metabolite that was reported to be significantly higher in HCC compared with healthy control in four studies but lower in one study, is allocated a penalty p = (4-1)/5 = 0.6); R was the total score from the RoB assessment, and C was extent of change, defined as log2FC, if fold change was reported; otherwise an estimate using the median of log2FC values reported in each particular comparison and direction of change was used.
Thus, a positive weighted score indicated a metabolite to be predominantly reported to be higher in HCC compared with control and vice versa; the higher the absolute value, the more studies reported the change, and the greater the extent of change, with the reported direction of change being consistent and/or the lower the RoB in the reporting studies. Reported metabolites in each sample type were than ranked by the absolute value of the weighted score. For urine and blood, metabolites were ranked by the sum of the weighted score across the three comparisons (HCC vs. healthy, HCC vs. noncirrhotic LD, HCC vs. cirrhosis). Data analysis and visualisation were carried out in R (version 3.6; https://community.rstudio.com/t/r-version-3-6-0-and-rtools/31418).
Results
Summary of studies included in the systematic review
A total of 2,144 nonduplicated citations were identified from Medline and EMBASE, with an additional three identified manually (Fig. 1). After excluding 2,044 citations on the basis of titles and abstracts, full text of 103 articles were retrieved and reviewed for eligibility. Nineteen of the studies were excluded with reasons provided (Fig. 1). Finally, a total of 84 studies (15, 21, 28–108) were included in the systematic review, of which 15, 54, and nine analyzed tissue, blood, and urine samples, respectively, with six studies presenting findings of two sample types (five blood and tissue, and one blood and urine; Fig. 2A). The number of HCC cases in the studies ranged between 5 and 361. Studies using blood samples had the highest median HCC cohort size (n = 34), compared with n = 28 for urine and n = 29 for tissue (Fig. 2B). Over half of the studies (n = 47) were conducted in China, with United States having the second highest number of studies (n = 10; Fig. 2C). For over half the studies included, the HCC cases either had chronic hepatitis B or C as the main underlying etiology (Fig. 2D). For studies using urine or blood samples, most had healthy volunteers or cirrhosis as comparison groups. Nineteen of the 20 studies that analyzed tissue samples had matched NT tissue from patients with HCC as the control sample (Fig. 2E). LC/MS was the most common analytic method used (Fig. 2F). Details of individual studies are shown in Supplementary Table S3.
Summary characteristics of the included studies. A, Biosamples analyzed (T: tissue; B: blood; U: urine). B, Number of HCC cases (excluding validation cohort). C, Country where study was conducted in. D, Underlying etiology of HCC cases investigated. For each biosample type, comparison groups (E). F, Analytic platform used. Note for (B, E, and F): studies that analyzed more than one biosample type are presented twice, once in each biosample.
Summary characteristics of the included studies. A, Biosamples analyzed (T: tissue; B: blood; U: urine). B, Number of HCC cases (excluding validation cohort). C, Country where study was conducted in. D, Underlying etiology of HCC cases investigated. For each biosample type, comparison groups (E). F, Analytic platform used. Note for (B, E, and F): studies that analyzed more than one biosample type are presented twice, once in each biosample.
Assessment of RoB
To assess the RoB of the studies reviewed, because there was no readily available tool for metabonomic studies, a domain-based bespoke tool incorporating items from existing RoB tools (24, 25) along with requirements from published reporting standards in the field of metabonomics (26, 27) was developed (Table 1). Domains 1, 2, and 3 assessed the overall methodologic and reporting concerns of the publication and gave a maximum total points of 14 (Fig. 3). The reviewed studies scored an average of 5.1 points. The average points for studies that analyzed blood samples and tissue samples were 5.6 and 4.8, respectively. Studies with urine samples had the lowest average points of 3.8; lower than the overall average (one-sample t test P = 0.0503). There was a weak correlation (Spearman ρ = 0.325, P = 0.0025) between the year of publication and RoB points, suggesting RoB of published studies was on a trend of lowering.
RoB assessment of included studies. Individual studies were sorted according to biosample studied and total score (the higher the score, the lower the risk of bias). Reference number displayed in [squared brackets].
RoB assessment of included studies. Individual studies were sorted according to biosample studied and total score (the higher the score, the lower the risk of bias). Reference number displayed in [squared brackets].
In terms of study design, 20 of 60 studies that analyzed blood samples had a study size of the HCC group of 50 or greater. Only 1 of 20 and 3 of 10 studies for studies that analyzed tissue and urine, respectively, had 50 or more HCC cases. Nearly a quarter of the studies (20/84, 24%) did not report basic demographic or clinical characteristics of participants separately for each study group. There were also 20 of 84 (24%) that did not state diagnostic method used for HCC cases. Twenty-five studies (30%) discussed how potential confounders (age, sex, or underlying etiology) may have affected findings: 21 had taken confounders into account in their statistical analysis, using adjusted models or by demonstrating that potential confounders had no effect on findings. A total of 21 (25%) of studies had independent validation cohorts, of which, 19 were validation of findings from analysis of blood samples, and one each for tissue and urine.
Concerns in Domain 2 regarding chemical analysis included items 2.2 and 2.4. Only 22 of 84 (26%) papers explicitly stated that samples were randomized. A total of 44 of 79 studies (56%) stated use of a pooled quality control (QC) sample, of which 18 showed reproducibility of QCs across the analysis (five of the studies used targeted LC/MS analysis only, for which QCs were not required). Of particular concern in Domain 3 on data analysis was the small proportion (23/84, 27%) of publications implementing multiple testing correction (109, 110) for determining statistical significance.
Domain 4 (Table 1), concerning the reporting of discriminatory metabolites, was metabolite specific, as the score differed for each metabolite reported in each publication. For example, the highest level (level I) of confidence in metabolite identification (27), whereby the metabolite is structurally elucidated on the basis of robust analytical evidence based on more than one analytic platform with verification using an authentic compound, may only have been achieved for a subset of reported metabolites. Less than half of studies (40/84, 48%) had any discriminant metabolites with identification made with level 1 in confidence, suggesting the majority of studies relied on metabolite identification based only on matching to reference databases.
Discriminatory metabolites between HCC and comparison groups
A total of 2,302 entries of differential metabolites were extracted from 84 studies reviewed. The top 30 metabolites with the highest absolute values in weighted score as well as metrics from intermediate steps leading to the final weighted score are listed in Supplementary Tables S4–S6.
There were 699 entries of reported discriminant metabolites of 476 unique compounds in tissue, 684 of which were based on the comparison of HCC tumor (T) versus matched NT liver tissues and 15 others were from comparisons between tumor and nonmatched healthy liver tissues. Because of the paucity of data from nonmatching comparisons, we focused only on T versus matched NT in the analysis. Of the 684 entries, 275 entries reported upregulated metabolites, whereas 409 were downregulated. A total of 288 entries reported fold change. The median log2FC for upregulation was 0.807 and −0.811 for downregulation. These values were used as estimates in the weighted score calculation for entries without reported fold change values.
For blood, there were 1,376 entries of 590 unique compounds. There were 410, 125, and 841 entries for HCC versus cirrhosis, HCC versus noncirrhotic LD and HCC versus healthy control, respectively. The median log2FC of upregulated and downregulated entries were 0.516 (based on 121 entries) and −0.515 (based on 115 entries) for HCC versus cirrhosis and 0.872 (based on 127 entries) and −0.737 (based on 212 entries) for HCC versus healthy. For HCC versus LD, because there were only nine entries with fold change, the median of all log2FC values reported in blood were used as estimates in weighted score calculations instead (0.669 for increased metabolites in tumor, −0.655 for decreased metabolites).
There were 222 entries of 126 unique compounds reported in urine. This includes 73 entries of HCC versus cirrhosis, 40 entries of HCC versus LD and 109 entries of HCC versus healthy control. Because of the lack of studies reporting fold change values (a total of 31 entries from a single publication only), median of log2FC from these 31 (0.614 for upregulated metabolites, −1.03 for downregulated metabolites) were used for weighted score calculation for all urine entries, regardless of comparison.
Synthesis results of discriminatory metabolites
For each entry of discriminant metabolites reported, the log2FC (or an estimate of which) was scaled by the RoB score of the reporting publication. Subsequently, all reports of a metabolite in a comparison were summed. Finally, to further penalize metabolites that reported significant change in opposite directions, the sum was weighted by a penalty, corresponding to the fraction of difference between the number of contradictory reports, to produce the final weighted score. Using this approach, a ranked list was produced from comparison of HCC T versus matched NT show that highest-ranking metabolites included the decrease of glycerol 3-phosphate, malic acid and niacinamide in tumor tissues (Fig. 4A). Other high-ranking metabolites include bile acids (glycocholic acid, glycochenodeoxycholic acid, and glycodeoxycholic acid), all of which were decreased in tumor tissues, and free fatty acids (including C16:1, C18:2n6,9), lysophosphocholines [including LPC(18:2), LPC(16:1), etc.] and acylcarnitines (C3:0 carnitine, C4-OH carnitine, etc.), which had different directions of change depending on chain length and the number of double bonds.
Top 30 metabolites ranked by |weighted score| in each sample type. A, Tissue. B, Blood. C, Urine. A positive value suggests that the consensus among the reports is that the metabolite is higher in HCC than in the comparison group, vice versa. The weighted score combines log2(FC) values, the RoB of the reporting publication and a penalty for contradictory reports of direction of change. LD, noncirrhotic liver disease.
Top 30 metabolites ranked by |weighted score| in each sample type. A, Tissue. B, Blood. C, Urine. A positive value suggests that the consensus among the reports is that the metabolite is higher in HCC than in the comparison group, vice versa. The weighted score combines log2(FC) values, the RoB of the reporting publication and a penalty for contradictory reports of direction of change. LD, noncirrhotic liver disease.
For metabolites in urine or blood, ranking was made using summation of the weighted score of all three comparisons (HCC vs. healthy, HCC vs. LD, HCC vs. cirrhosis). This was based on the assumption that an ideal metabolite should discriminate HCC from all three of the comparison groups. Hence, metabolites that showed strong discrimination with all three groups in the same direction were favored using this method.
The top three highest ranking metabolites in blood were primary bile acids: glycocholic acid, taurocholic acid, and taurochenodeoxycholic acid (Fig. 4B). However, the pattern of having a high positive weighted score in HCC versus healthy, but a negative score in HCC versus cirrhosis suggested that levels of these bile acids in HCC were between those in healthy patients and patients with cirrhosis. Other high-ranking metabolites (gluconic acid, hypoxanthine) had a high weighted score for one comparison, but a low score for another, indicating a different degree of change depending on the comparison group. Only a few metabolites, such as trimethylamine-N-oxide (TMAO) and 2-hydroxybutyric acid had similar scores for both the HCC versus healthy comparison and the HCC versus cirrhosis comparison, indicating the evidence of change in HCC compared with healthy and cirrhosis were similar.
In urine, the weighted scores had lower values, owing to fewer number of studies investigating urine (Fig. 4C). High ranking metabolites included creatinine, hippuric acid and TMAO, all of which were detected in lower concentrations in HCC. Unlike blood, for urine metabolites with reports of more than one comparison, all showed uniform direction of change across different comparisons, albeit the extent of which may have differed.
Discussion
The aim of this systematic review was to comprehensively compile a database of all reported discriminatory metabolites for HCC compared with comparison groups in blood, urine, and tissue, and to identify any metabolites that may be potential biomarkers that should be investigated in future studies. A total of 84 publications were identified to be eligible for inclusion in the review after a two-stage screening process (title and abstract screening, followed by full-text screening). Data were extracted from each eligible publication and RoB was assessed using a bespoke tool for metabonomic studies, which was developed on the basis of existing RoB tools and minimal reporting standards specific for metabonomic studies. Finally, a weighted score system was implemented to rank metabolites according to: their frequency of reported to be significantly deregulated in HCC; the extent of change; consistency in the direction of change reported; and RoB of reporting publications. Using this approach, a ranked list of metabolites, discriminatory for HCC, was produced for each sample type. While there was not a single metabolite, or a combination of metabolites that could be concluded definitively as potential diagnostic markers, this body of work produced a resource for future research on this topic.
The systematic extraction of data and the assessment of RoB in this review highlighted heterogeneity in published study design and incomplete reporting of essential aspects of the relevant metabonomic studies. As the metabolome is prone to influence from dietary intake and time of day (111), random sampling of biofluids obtained from nonfasted individuals may complicate data mining processes. For the majority of studies, the omission of reporting of sample randomization, which may lead to undue bias of batch effects (26), and performance of quality control samples prevented assessment of analytical reproducibility. Given that analytic methods used in metabonomic studies allow measurement of many compounds at once, the number of variables tested was usually far greater than the number of observations. Where the number of variables exceeds the number of samples (n >> K), results are prone to false positives and necessitate the use of multiple testing corrections when testing for significance of discriminant variables (candidate biomarkers; ref. 112). However, it was only adopted in 27% of studies reviewed. Finally, reliance on online or in-house databases for metabolite identification, without demonstrating confirmation using chemical standards or orthogonal supporting evidence, risks erroneous assignments (112). Taken together, these factors highlight the necessity for future studies to adhere to standardized study design, or at minimum, reporting that meets minimal standards to reduce risks of bias.
As a result of heterogeneity in the way studies were conducted, it was necessary to exercise caution with regard to interpretation of the literature. The weighted score system developed for this review was based on the assumption that the more times a metabolite was found to be significantly changed and the more consistent the direction of change was reported in different cohorts, the more likely it is that the metabolite could be a potential marker of clinical utility. The weighted score system was an improvement over previous synthesis of metabonomics studies (113), in which only vote count was used, that is, only number of reports with direction of change but not taking into account the RoB of the reports and the extent of change observed. log2FC or an estimate of which using the median of reported fold change values was used in place of simple vote count to incorporate extent of change. To take into account RoB of studies and to minimize bias against larger studies, fewer in number but with greater statistical power (114), the RoB score was incorporated to scale the log2FC. Finally, a penalty was applied for metabolites with reports of significant changes in opposite directions.
The resulting ranked lists of metabolites associated with HCC for each sample type [tissue, blood (serum or plasma) and urine] display metabolites that were reported to be changed consistently in multiple studies (Fig. 4; Supplementary Table S4–S6). In tissue, all top 30 metabolites had 100% concordance in the reported direction of change (discordance penalty value of 1, that is, no penalty applied). Such high degree of agreement is promising in that it suggests similar patterns of change in tumor tissues across different cohorts, which warrants further biological interpretation and investigations for mechanistic understanding of the disease. For both blood and urine, most metabolites do not have reports for all three comparisons (HCC vs. cirrhosis, HCC vs. liver disease and HCC vs. healthy control) and some metabolites show different directions (e.g., several bile acids in blood) and extent of change in different comparisons (e.g., hypoxanthine in blood, and creatinine in urine). Therefore, due to insufficient evidence and lack of coherence across studies, no definitive potential noninvasive markers can be derived at this stage.
Despite the inconclusive finding, the ranked lists of discriminatory metabolites provide important insight for informing future research. Findings from the studies reviewed revealed that metabolites involved in various metabolic processes are changed in HCC. This suggests that there is no shortage of discriminatory metabolites, but the key question in this field of research is to choose the one or a panel of metabolites that can best serve as diagnostic markers for HCC. This selection process should be informed by having three biological considerations taken into account.
First, candidate markers should reflect HCC tumorigenesis, rather than a secondary effect to HCC development. Therefore, there should be evidence that any potential marker found in biofluids originated from the tumor. To this end, the strategy of testing for concurrent changes in tissue and in circulation, as adopted by four of the studies reviewed (53, 54, 73, 74), is one approach. Among these four studies, one discovered candidate metabolites in serum, then validated their biological relevance in tissue samples, while another two studies investigated potential metabolites in tissue samples, then validated them in serum. Only one study looked for metabolites from both biofluid and tissue samples at the outset. It should be noted that the candidate biomarkers found in serum (acetylcarnitine, propionylcarnitine, and betaine) belonged to very different chemical classes from those identified in tissue (phenylalanyl-tryptophan and glycocholate). However, because studies with such design is relatively scarce, comparing metabolites reported from tissue studies with those reported in biofluid studies serves as a good starting point. Differences in metabolite level in tumor tissue compared with differences in blood may be affected by various processes including uptake, secretion (or release due to apoptosis), synthesis, degradation, or other metabolic reactions at the cellular level. Different patterns of alteration in tissues and biofluids may reflect different tumorigenesis of various etiologies. A metabolite that is found to be higher in tumor tissue and higher in circulation may suggest heightened synthesis accompanied by release. Unfortunately, the two highest ranking metabolites found to be higher in tumor tissue compared with matched NT tissue, O–Phosphoethanolamine and 5′-methyltioadenosine, have not been reported in blood studies. On the other hand, a metabolite found to be increased in tumor tissue and decreased in circulation may reflect increased uptake. l-glutamine is one example (Supplementary Table S5). This is supported by the well-established understanding that cancer cells (including HCC) rely on l-glutamine as an energy source with increased uptake through the upregulation of glutamine transporter ASCT2 (115, 116). Metabolites found to be lower in both tumor tissue and blood in patients with HCC, such as malate as a tricarboxylic acid intermediate, may reflect downregulation of the relevant pathway. Alternatively, for compounds known to be synthesized and secreted by hepatocytes, lower levels in both blood of patients with HCC and in tumor tissue may suggest failure for tumor cells to maintain their synthetic functions leading to lowered overall level in circulation. Fibroblast growth factor 19, which is upregulated in cholestatic and cirrhotic conditions, downregulates bile acid synthesis and promotes tumorigenesis in the liver (117). This supports the observation that the primary bile acid, glycocholic acid, was found to be lower both in tumor tissue as well as in blood of patients with HCC. Finally for metabolites found to be lower in T tissue but increased in circulation, such as myo-inositol and l-carnitine, the likelihood of their change in blood being a direct effect of HCC is low unless there is active heightened secretion, evidence of which is lacking. The above discussed metabolites may be of greater interest to be followed up in future blood studies due to their concurrent reports of significant change in previous blood and tissue studies. While in addition to demonstrating high sensitivity and specificity, an ideal biomarker should be obtained from a noninvasive or minimally invasive sample and be cheap to measure and analytically robust, the use of tissue biomarkers can deliver mechanistic information and may lead to development of assays for the same analyte in less invasive samples. A further caveat when evaluating the utility of tissue biomarkers in the diagnosis of HCC, or any other disease, is that studies with tissue biopsies tend to use smaller numbers of participants due to the invasive nature of sample collection and therefore the statistical power may be substantially reduced, thus the utility in such studies lies more in the potential to link circulating or urinary biomarkers to mechanistic pathways.
Second, candidate markers should be specific to HCC, rather than being a marker of liver damage. Therefore, ideally, a metabolite should only display altered levels in HCC, but not in patients with cirrhosis, that is, the extent and direction of change being similar across the three comparisons (HCC vs. healthy, HCC vs. LD, HCC vs. cirrhosis). However, many metabolites reported in the studies reviewed, for example, phenylalanyl-tryptophan (phe-trp), show progressive decrease in the healthy, cirrhosis and HCC groups (74). The three top ranking metabolites in blood, the primary conjugated bile acids, glycocholic acid, taurocholic acid, and taurochenodeoxycholic acid, have the most notable patterns of alteration across the three comparisons of top ranking metabolites. Their increase compared to healthy control is likely due to cholestasis that is frequent in patients with liver disease, while their decrease compared with patients with cirrhosis likely reflects the reduced capacity of livers with HCC to synthesize them, as discussed earlier. An alternative source of dysregulation of these bile acids may involve the microbiome as the gut-liver axis is disrupted in HCC leading to a decrease in primary and increase in secondary bile acids (118). These bile acids also have a capacity to alter hepatic expression of CXCL16 and CXCL16-mediated natural killer T-cell recruitment (118). Alteration of other microbially generated metabolites such as TMAO, hippurate and 5-hydroxyindoleacetic acid in HCC also point to the involvement of the microbiome more generally in liver disease. Because of the level of these discriminatory bile acids being intermediate between healthy individuals and patients with cirrhosis, the applicability of these bile acids as biomarker, at least on their own, is limited despite being top ranking according to the scoring metric applied. Given that 80% of HCCs develop in patients with cirrhosis (13) and cirrhosis alone accounts for substantial metabolic changes in the body (119), future research efforts should focus on the HCC versus cirrhosis comparison and only use the other two comparisons to confirm findings. This strategy may help avoid identifying metabolites that are markers of liver damage, rather than markers specific for HCC.
Third, the marker should be universal, that is, valid independent of genetic, environmental, dietary, or etiologic factors. For this, validation studies should be conducted in cohorts in different geographical locations and with different underlying aetiologies (Fig. 2C and D). The study conducted by Luo and colleagues (74) is the largest serum study to date and concluded that phe-trp and glycocholic acid as markers for delineating HCC from cirrhosis. Despite having a validation study to confirm the findings, phe-trp has only been reported in one other study (48). Future profiling efforts should also specifically target these previously shortlisted metabolites to confirm their validity in different cohorts.
In addition to biological considerations, technical and practical considerations should be taken into account for HCC marker selection and validation. In terms of technical considerations, if a panel, rather than a single metabolite, is necessary to perform as a highly accurate diagnostic test, efforts should aim to minimize the number of compounds on the panel and take into account ease of detecting all marker compounds simultaneously and accurately in a single assay. As for practicality, given that HCC has the highest incidence in areas with limited resources (9, 10), cost, resource availability and logistics should be considered. Although a urinary test may be more easily implemented than a blood test, the presence of candidate biomarkers in blood versus urine should be considered.
As illustrated above, future research efforts on this topic should be guided by existing evidence and informed by biological understanding. Unlike clinical chemistry assays which largely follow the same format, metabolic profiling can be conducted on a wide variety of platforms, each with their own strengths and limitations. Therefore, it is important to adhere to the ‘minimum’ reporting standards as published by the chemical analysis working group of the Metabolomics Standards Initiative (25, 26) which sets recommendations to improve the confidence in metabolites identified across multiple analytic platforms and assays. In addition to standardization of study design and adherence to minimal reporting standards, future metabonomic studies should be designed in a hypothesis-driven manner with the above discussed considerations taken into account. The systematically compiled ranked lists of metabolites presented here for each sample type (Fig. 4; Supplementary Tables S4–S6) provide an informative resource for electing metabolites to be further investigated.
Although the bespoke tool for assessing RoB should be of use in defining robust biomarkers for HCC, and perhaps other conditions, a limitation of this review is that the tool has not yet been validated in an independent systematic review. This tool for assessing RoB was primarily developed to overcome the lack of an existing one for metabonomic studies. It addresses inherent biases in the analysis and reporting of differential metabolites but does not accommodate inherent biases in study design such as appropriate choice and source of control participants. The weighting of domains and items regarding their relative importance in their contribution to a publication's RoB could be discussed and adjusted. However, this tool was developed by modifying existing tools (24, 25) with items for minimal reporting standards of metabonomic studies (9, 10) incorporated in an iterative process with the final version being a consensus reached by all authors. Similarly, the weighted score system for ranking discriminatory metabolites was first of its kind. It was developed to circumvent the heterogeneity in study design, chemical analysis, and data analysis across studies which present a major challenge for synthesizing the findings. Ranking of the resulting final score, as well as ranking using primary data and the intermediate steps are provided for reference (Supplementary Table S4–S6).
Another limitation is that the studies reviewed were subject to publication and observer biases. All studies reported one or more statistically significant discriminatory metabolites between HCC and comparison group(s). Even those that were excluded in the full-text screening step were not due to the absence of discriminatory metabolites. Rather, they were, for example, publications focused on targeted assay development where statistical tests were not reported and therefore confidence in the results could not be ascertained. A limitation of metabonomics, unlike some other “ omics,” is that the choice of analytic method or assay used affects the collection of compounds detected or measured because of the chemical diversity of the metabolome. As such, the dominance of bile acids or acylcarnitines may be due to investigators’ choice of using targeted methods for these compounds. By using a scoring system that favours the number of coherent reports, it may be biased towards these metabolites that were selectively investigated in more studies. However, because investigators likely chose to assay certain classes of metabolites with well-supported reasons, the resulting ranked list based on the weighted score should still be valid.
The application of metabonomics to clinical question has been heralded as a promising field for providing novel biomarkers for diagnostic and prognostic purposes. However, to date, contributions from the field have not yet been translated to new biomarkers endorsed by clinical guidelines for use in the clinic. The main question of this review, seeking to identify novel diagnostic biomarkers for HCC, is among the most popular clinical questions researched in metabonomics. Along with the need for validating existing evidence with biological, technical, and practical considerations taken into account, the standardization of study design and adherence to minimal reporting standards are crucial for the field to move beyond exploratory studies to phase two clinical trials.
Conclusions
A new tool to identify potential diagnostic biomarkers for HCC taking into account the quality of the primary studies reporting candidate metabolites of HCC was developed and applied. While there was not any metabolite that could be definitively concluded to be a good candidate biomarker, this review has led to a systematic compilation of reported discriminatory metabolites which offers a valuable resource for guiding future research on this topic. Validation studies, standardized study designs, and publications meeting minimal reporting standards are crucial for advancing the field beyond exploratory studies.
Authors' Disclosures
E.Y.-L. Shen reports grants from Chang Gung Medical Research outside the submitted work. M.R. Thursz reports grants from NIHR-Imperial BRC during the conduct of the study. No disclosures were reported by the other authors.
Acknowledgments
The authors would like to thank Rebecca Jones for her advice and feedback on the development of the database search strategies. We thank the UK National Institute for Health Research (NIHR), the UK Biotechnology & Biological Sciences Research Council (BBSRC, BB/N016847/1) for funding (E. Holmes), and the Department of Jobs, Tourism, Science and Innovation, Government of Western Australian Premier's Fellowship and ARC Laureate Fellowship funding (E. Holmes). We also thank Chang Gung Medical Research Grant (CMRPG3G1211, CMRPG3G1212, and CMRPG3G1213) for funding (E.Y.-L. Shen).
M.R.A. U is an Imperial College London President's PhD Scholar. EYLS was funded by Chang Gung Medical Research Grant (CMRPG3G1211, CMRPG3G1212, and CMRPG3G1213) for his PhD at Imperial College London. A. Alkhatib was funded by a grant from the Newton-Mosharafa Fund (Cairo, Egypt). S.D. Taylor-Robinson was funded by grants from the Newton Fund and the Wellcome Institutional Strategic Support Fund at Imperial College London. All authors are grateful to the UK NIHR Biomedical Facility at Imperial College London for infrastructure support. The authors received no financial support for conducting this systematic review, and the funding bodies of the authors have no influence on the study.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.