Abstract
Background: Metabolomics plays an important role in providing insight into the etiology and mechanisms of hepatocellular carcinoma (HCC). This is accomplished by a comprehensive analysis of patterns involved in metabolic alterations in human specimens. This study compares the levels of plasma metabolites in HCC cases versus cirrhotic patients and evaluates the ability of candidate metabolites in distinguishing the two groups. Also, it investigates the combined use of metabolites and clinical covariates for detection of HCC in patients with liver cirrhosis.
Methods: Untargeted analysis of metabolites in plasma from 128 subjects (63 HCC cases and 65 cirrhotic controls) was conducted using gas chromatography coupled to mass spectrometry (GC-MS). This was followed by targeted evaluation of selected metabolites. LASSO regression was used to select a set of metabolites and clinical covariates that are associated with HCC. The performance of candidate biomarkers in distinguishing HCC from cirrhosis was evaluated through a leave-one-out cross-validation based on area under the receiver operating characteristics (ROC) curve.
Results: We identified 11 metabolites and three clinical covariates that differentiated HCC cases from cirrhotic controls. Combining these features in a panel for disease classification using support vector machines (SVM) yielded better area under the ROC curve compared with alpha-fetoprotein (AFP).
Conclusions: This study demonstrates the combination of metabolites and clinical covariates as an effective approach for early detection of HCC in patients with liver cirrhosis.
Impact: Further investigation of these findings may improve understanding of HCC pathophysiology and possible implication of the metabolites in HCC prevention and diagnosis. Cancer Epidemiol Biomarkers Prev; 26(5); 675–83. ©2016 AACR.
Introduction
Hepatocellular carcinoma (HCC) is the fifth most common cancer in the world and the third leading cause of cancer mortality worldwide (1). The estimate of new cases of liver cancer (including intrahepatic bile duct cancers) expected to occur in the United States during the 2015 year was 35,660 with approximately three-fourths to be HCC (2). The survival rate of patients with HCC is still 5% (3), and it can only be significantly improved if the diagnoses are made at earlier stages, when treatment is more effective. Ultrasonography, performed every 6 months, is the currently recommended screening and surveillance for patients with established liver cirrhosis (4). The diagnosis of HCC by imaging techniques requires availability of equipment and a correct interpretation of the results, which are limited in regions with high HCC burden (5). Other than liver imaging and histology, current diagnosis of HCC relies on measurement of level of the serum biomarker, α-fetoprotein (AFP). However, the sensitivity and specificity of AFP are not sufficient for diagnosis of HCC as elevated AFP levels may be seen in patients with cirrhosis or chronic hepatitis too (5). Different variants of AFP such as AFP-L1, AFP-L2, and AFP-L3 have been studied to improve its diagnostic performance (6). Des-gamma-carboxy-prothrombin (DCP) has also been investigated as a potential biomarker for HCC (7). However, reliable serologic biomarkers for early detection of HCC in high-risk population of cirrhotic patients are yet to be found and validated.
Metabolomics has been broadly used for biomarker discovery for many human diseases, including cancer (8). It provides simultaneous assessment of a broad range of metabolites. In this article, we evaluate the levels of plasma metabolites measured by gas chromatography coupled with selected ion monitoring mass spectrometry (GC-SIM-MS), combined with clinical covariates in detecting early-stage HCC cases in patients with liver cirrhosis recruited at MedStar Georgetown University Hospital (MGUH, Washington, DC). Metabolites and clinical covariates relevant for detecting HCC in cirrhotic patients were selected through least absolute shrinkage and selection operator (LASSO) logistic regression (9). We observed that the combination of LASSO-selected metabolites and AFP, Child–Pugh score, and etiologic factors leads to improved area under the ROC curve compared with AFP. We used correlation and network analyses to evaluate any associations among the selected metabolites and clinical covariates. Finally, we performed pathway enrichment analysis to examine the biological meaning of the results.
Materials and Methods
Study cohort and sample collection
Adult patients were recruited from the hepatology clinic at MGUH. The characteristics of 128 patients investigated in this study are summarized in Table 1. All participants provided informed consent to the study approved by the Institutional Review Board at Georgetown University. The patients were diagnosed to have liver cirrhosis on the basis of established clinical, laboratory, and/or imaging criteria. Cases were diagnosed to have HCC based on well-established diagnostic imaging criteria and/or histology. Clinical stages for HCC cases were determined on the basis of the tumor–node–metastasis (TNM) staging system. Controls were required to be HCC free for at least 6 months from the time of study entry.
. | . | Case . | Control . | . |
---|---|---|---|---|
. | . | N = 63 (%) . | N = 65 (%) . | . |
. | . | 63.0 . | 65.0 . | Pa . |
Age | Mean (SD) | 60.0 (6.4) | 58.6 (7.2) | 0.2561 |
Race | African American | 17.0 (27.0) | 15.0 (23.1) | 0.3938 |
White | 33.0 (52.4) | 43.0 (66.2) | ||
Asian | 6.0 (9.5) | 2.0 (3.1) | ||
Hispanic/Latino | 4.0 (6.3) | 2.0 (3.1) | ||
Other | 3.0 (4.8) | 3.0 (4.6) | ||
Gender | Female | 18.0 (28.6) | 19.0 (29.2) | 1.0000 |
Male | 45.0 (71.4) | 46.0 (70.8) | ||
Smoker | Current | 14.0 (22.2) | 15.0 (23.1) | 1.0000 |
Former | 31.0 (49.2) | 32.0 (49.2) | ||
None | 18.0 (28.6) | 17.0 (26.2) | ||
Alcohol | Current | 8.0 (12.7) | 11.0 (16.9) | 0.7809 |
Former | 33.0 (52.4) | 34.0 (52.3) | ||
None | 21.0 (33.3) | 19.0 (29.2) | ||
BMI | Mean (SD) | 30.2 (6.6) | 29.2 (6.3) | 0.3838 |
Diabetes | No | 39.0 (61.9) | 40.0 (61.5) | 1.0000 |
Yes | 24.0 (38.1) | 23.0 (35.4) | ||
Family history of cancer | No | 25.0 (39.7) | 24.0 (36.9) | 0.6129 |
Unknown | 2.0 (3.2) | 5.0 (7.7) | ||
Yes | 36.0 (57.1) | 35.0 (53.8) | ||
Etiology | Alcohol | 17.0 (27.0) | 25.0 (38.5) | 0.0210 |
Autoimmune | 2.0 (3.2) | 1.0 (1.5) | ||
Cryptogenic | 1.0 (1.6) | 0.0 (0.0) | ||
HBV | 9.0 (14.3) | 1.0 (1.5) | ||
HCV | 39.0 (61.9) | 29.0 (44.6) | ||
NAFLD | 4.0 (6.3) | 3.0 (4.6) | ||
Other | 2.0 (3.2) | 6.0 (9.2) | ||
PBC | 0.0 (0.0) | 3.0 (4.6) | ||
PSC | 2.0 (3.2) | 4.0 (6.2) | ||
HCV Ab | Negative | 24.0 (38.1) | 34.0 (52.3) | 0.0860 |
Positive | 37.0 (58.7) | 27.0 (41.5) | ||
Anti HBc | Negative | 30.0 (47.6) | 40.0 (61.5) | 0.1822 |
Positive | 27.0 (42.9) | 19.0 (29.2) | ||
Unknown | 1.0 (1.6) | 1.0 (1.5) | ||
HBsAg | Negative | 49.0 (77.8) | 55.0 (84.6) | 0.1247 |
Positive | 8.0 (12.7) | 3.0 (4.6) | ||
Ascites | No | 37.0 (58.7) | 21.0 (33.3) | 0.0038 |
Yes | 24.0 (38.1) | 42.0 (66.7) | ||
AST | Median (IQR) | 83.0 (74.0) | 67.0 (52.5) | 0.1216 |
ALT | Median (IQR) | 70.0 (67.0) | 49.5 (36.8) | 0.0009 |
AFP | Median (IQR) | 28.8 (102.1) | 4.5 (11.0) | 0.0000 |
MELD | Median (IQR) | 10.0 (5.0) | 14.0 (7.0) | 0.0000 |
≤10 | 30.0 (47.6) | 10.0 (15.4) | 0.0000 | |
>10 | 28.0 (44.4) | 54.0 (83.1) | ||
Mean (SD) | 11.4 (4.1) | 16.2 (13.8) | 0.0087 | |
Stage | I | 37.0 (58.7) | ||
II | 20.0 (31.7) | |||
III | 6.0 (9.5) | |||
HCV RNA | Median (IQR) | 350800.0 (856449.0) | 293900.0 (878781.4) | 0.7622 |
>281 | 27.0 (42.9) | 16.0 (24.6) | 0.3415 | |
≤281 | 5.0 (7.9) | 7.0 (10.8) | ||
Child–Pugh score | Median (IQR) | 7.0 (3.0) | 9.0 (3.0) | 0.0001 |
Mean (SD) | 7.1 (2.1) | 8.8 (2.4) | 0.0001 | |
Child–Pugh grade | A | 24.0 (38.1) | 9.0 (13.8) | 0.0011 |
B | 23.0 (36.5) | 34.0 (52.3) | ||
C | 7.0 (11.1) | 18.0 (27.7) |
. | . | Case . | Control . | . |
---|---|---|---|---|
. | . | N = 63 (%) . | N = 65 (%) . | . |
. | . | 63.0 . | 65.0 . | Pa . |
Age | Mean (SD) | 60.0 (6.4) | 58.6 (7.2) | 0.2561 |
Race | African American | 17.0 (27.0) | 15.0 (23.1) | 0.3938 |
White | 33.0 (52.4) | 43.0 (66.2) | ||
Asian | 6.0 (9.5) | 2.0 (3.1) | ||
Hispanic/Latino | 4.0 (6.3) | 2.0 (3.1) | ||
Other | 3.0 (4.8) | 3.0 (4.6) | ||
Gender | Female | 18.0 (28.6) | 19.0 (29.2) | 1.0000 |
Male | 45.0 (71.4) | 46.0 (70.8) | ||
Smoker | Current | 14.0 (22.2) | 15.0 (23.1) | 1.0000 |
Former | 31.0 (49.2) | 32.0 (49.2) | ||
None | 18.0 (28.6) | 17.0 (26.2) | ||
Alcohol | Current | 8.0 (12.7) | 11.0 (16.9) | 0.7809 |
Former | 33.0 (52.4) | 34.0 (52.3) | ||
None | 21.0 (33.3) | 19.0 (29.2) | ||
BMI | Mean (SD) | 30.2 (6.6) | 29.2 (6.3) | 0.3838 |
Diabetes | No | 39.0 (61.9) | 40.0 (61.5) | 1.0000 |
Yes | 24.0 (38.1) | 23.0 (35.4) | ||
Family history of cancer | No | 25.0 (39.7) | 24.0 (36.9) | 0.6129 |
Unknown | 2.0 (3.2) | 5.0 (7.7) | ||
Yes | 36.0 (57.1) | 35.0 (53.8) | ||
Etiology | Alcohol | 17.0 (27.0) | 25.0 (38.5) | 0.0210 |
Autoimmune | 2.0 (3.2) | 1.0 (1.5) | ||
Cryptogenic | 1.0 (1.6) | 0.0 (0.0) | ||
HBV | 9.0 (14.3) | 1.0 (1.5) | ||
HCV | 39.0 (61.9) | 29.0 (44.6) | ||
NAFLD | 4.0 (6.3) | 3.0 (4.6) | ||
Other | 2.0 (3.2) | 6.0 (9.2) | ||
PBC | 0.0 (0.0) | 3.0 (4.6) | ||
PSC | 2.0 (3.2) | 4.0 (6.2) | ||
HCV Ab | Negative | 24.0 (38.1) | 34.0 (52.3) | 0.0860 |
Positive | 37.0 (58.7) | 27.0 (41.5) | ||
Anti HBc | Negative | 30.0 (47.6) | 40.0 (61.5) | 0.1822 |
Positive | 27.0 (42.9) | 19.0 (29.2) | ||
Unknown | 1.0 (1.6) | 1.0 (1.5) | ||
HBsAg | Negative | 49.0 (77.8) | 55.0 (84.6) | 0.1247 |
Positive | 8.0 (12.7) | 3.0 (4.6) | ||
Ascites | No | 37.0 (58.7) | 21.0 (33.3) | 0.0038 |
Yes | 24.0 (38.1) | 42.0 (66.7) | ||
AST | Median (IQR) | 83.0 (74.0) | 67.0 (52.5) | 0.1216 |
ALT | Median (IQR) | 70.0 (67.0) | 49.5 (36.8) | 0.0009 |
AFP | Median (IQR) | 28.8 (102.1) | 4.5 (11.0) | 0.0000 |
MELD | Median (IQR) | 10.0 (5.0) | 14.0 (7.0) | 0.0000 |
≤10 | 30.0 (47.6) | 10.0 (15.4) | 0.0000 | |
>10 | 28.0 (44.4) | 54.0 (83.1) | ||
Mean (SD) | 11.4 (4.1) | 16.2 (13.8) | 0.0087 | |
Stage | I | 37.0 (58.7) | ||
II | 20.0 (31.7) | |||
III | 6.0 (9.5) | |||
HCV RNA | Median (IQR) | 350800.0 (856449.0) | 293900.0 (878781.4) | 0.7622 |
>281 | 27.0 (42.9) | 16.0 (24.6) | 0.3415 | |
≤281 | 5.0 (7.9) | 7.0 (10.8) | ||
Child–Pugh score | Median (IQR) | 7.0 (3.0) | 9.0 (3.0) | 0.0001 |
Mean (SD) | 7.1 (2.1) | 8.8 (2.4) | 0.0001 | |
Child–Pugh grade | A | 24.0 (38.1) | 9.0 (13.8) | 0.0011 |
B | 23.0 (36.5) | 34.0 (52.3) | ||
C | 7.0 (11.1) | 18.0 (27.7) |
aFisher exact test was used for categorical variables. Wilcoxon rank sum test was used for continuous variables not symmetrical distributed.
Through peripheral venipuncture, single blood sample was drawn into 10 mL BD Vacutainer sterile vacuum tube in the presence of EDTA anticoagulant. The blood was immediately centrifuged at 1,000 × g for 10 minutes at room temperature. The plasma supernatant was carefully collected and centrifuged at 2,500 × g for 10 minutes at room temperature. After aliquoting, plasma was kept frozen at −80°C until use.
Chemical and reagents
Deuterium-labeled internal standards were purchased from CDN isotopes. These include Tyrosine-d2 (D-1611), l-glutamic-2,3,3,4,4-d5 acid (D-899), l-alanine-2,3,3,3-d4 (D-1488), and l-phenyl-d5-alanine-2,3,3,-d3 (D-1241). Glycine-d5 (175838), Myristic acid–d27 (366889), Alkane standard mixture (68281), fatty acid methyl ester standards (FAME), C8 (260673), C9 (245895), C10 (299030), C12 (234591), C14 (P5177), C16 (P5177), C18 (S5376), C20 (10941), C22 (11940), C24 (87115), C26 (H6389), C28 (74701), except for the C30 purchased from TCI chemicals (T0812), methoxyamine hydrochloride (226904), and pyridine (360570) were purchased from Sigma Aldrich. MSTFA (TS-48910) was purchased from Thermo Scientific. HPLC grade 2 propanol, acetonitrile, and water were used for the extraction of metabolites. Helium was purchased from Robert Oxygen.
Experimental design and quality assessment
Among the 128 participants, plasma samples collected from 120 subjects (60 HCC cases and 60 patients with liver cirrhosis) were used for untargeted analysis and plasma samples from 84 subjects (40 HCC cases and 44 patients with liver cirrhosis) were used for targeted analysis, with an overlap of 74 participants between the two analyses. Plasma samples were divided into batches, with balanced proportions of cases and controls by clinical covariates, age, sex, and ethnicity, to allow adequate time intervals between sample derivatization and GC-MS instrument calibration prior to each analysis. For the untargeted experiment, samples were split into three batches of 40 samples each, while for targeted analysis, samples were divided into two batches of 41 and 43 samples, respectively. To monitor the system's stability and performance, quality assurance procedures were applied as follows. First, a retention index (RI) standard mixture was run at the beginning and the end of each batch for retention index calibration. The standard mixture was prepared by mixing a series of fatty acid methyl esters (FAME, C8-C30) and Alkanes (C10-C40), as described previously (10). Then, blank samples were prepared together with the patient samples by adding the derivatization reagents to an empty tube and following the same steps, to monitor possible contaminations and background ions introduced by the derivatization process. Finally, quality control (QC) samples were prepared by taking 10 μL of each derivatized sample within the batch and run at the beginning of each batch for system equilibrium, in between runs and at the end for quality assessment.
Metabolite extraction
Plasma metabolites were extracted by adding 1 mL of a working solution composed of acetonitrile, isopropanol, and water (3:3:2) containing isotope-labeled internal standards at a concentration of 1.25 μg/mL (Tyrosine-d2, l-glutamic-2,3,3,4,4-d5 acid, l-alanine-2,3,3,3-d4, l-phenyl-d5-alanine-2,3,3,-d3, Glycine-d5, Myristic acid d27) to 30 μL of plasma to evaluate the quality of metabolite extraction. After vortexing, samples were centrifuged at 14,500 × g for 15 minutes at room temperature. The supernatant was then split into aliquotes of 460 μL for subsequent untargeted and targeted analyses by GC-MS. Each supernatant was then concentrated to dryness in speedvac. The dried samples were kept at −20°C until derivatization prior to analysis by GC-MS. Twenty microliters of a 20 mg/mL methoxyamine hydrochloride in pyridine was added to the dried extracts, vortexed, and incubated at 30°C for 90 minutes. After returning the samples at room temperature, 80 μL of MSTFA was added, vortexed, and incubated at 30°C for 30 minutes. Samples were then centrifuged at 14,500 rpm for 15 minutes, and 60 μL of the supernatant was transferred into 250 μL clear glass autosampler vials.
Data acquisition and preprocessing
Untargeted metabolomic data were acquired by analyzing metabolites extracted from the plasma samples. The analysis was carried out using two GC-MS systems operated at full scan: a GC-qMS (Agilent Technologies 5975C MSD coupled to an Agilent Technologies 7890A GC) and a GC-TOFMS (LECO Pegasus TOF coupled to an Agilent Technologies 7890A GC). The GC-MS data acquisition and preprocessing were performed following the methods we reported previously (10). For targeted analysis, 46 metabolites from the following three sources were considered: (i) metabolites with statistically significant changes in the untargeted analysis of the samples derived from 120 participants of the same cohort (see ref. 10 for details on the statistical method), (ii) metabolites selected from our previous GC-MS study conducted on an Egyptian cohort (10), and (iii) metabolites retrieved from the literature by text mining. Targeted quantification was performed in selected ion monitoring (SIM) mode by using the GC-qMS platform, as described previously (10). For each analyte, four ions were selected on the basis of their specificity, where one ion was used as a quantifier for intensity calculation and the other three as qualifiers for confirmation. The fragments were manually selected on the basis of the uniqueness across coeluting analytes and their relative intensity compared with the base peak in the spectrum. Time segments were set up to allow at least 10 msec dwell time for each ion monitored. The complete list of the targeted metabolites (together with the IS) is shown in Supplementary Table S1. The GC-SIM-MS data were preprocessed by the Automated Mass Spectral Deconvolution and Identification System (AMDIS) for peak detection, deconvolution, and identification (11). The resulting peaks were aligned using Mass Profiler Professional (MPP) from Agilent Technologies.
Selection of metabolites and clinical covariates by LASSO
Two LASSO regression models were applied to select a set of metabolites and clinical covariates, respectively, based on their association with HCC or cirrhotic disease status. For the metabolites, the data matrix was obtained by preprocessing the GC-SIM-MS runs of the 84 plasma samples on the basis of the quantifier ion's intensity selected for each of the 46 metabolite targets. The metabolite intensities were log-transformed and the batch effect was removed by using R Combat package. For the clinical covariates, AFP measurements (reported in ng/mL) were also log-transformed to satisfy the linearity assumption with the log-odds of HCC status. For both LASSO models, the tuning parameter was chosen by (leave-one-out) cross-validation with deviance as the loss function. A univariate logistic regression model was also fit on each individual metabolite to examine its association with the risk of HCC. Adjusted P values were calculated following the Benjamini–Hochberg procedure (12). To further investigate the performance of the metabolites for early detection of HCC, a multinomial logistic regression model was fit considering the HCC stage I & II combined as a group and using the cirrhotic controls as a reference group.
Performance evaluation of predictors
Logistic regression models and support vector machines (SVM) were built to evaluate the performance of the predictors selected by LASSO. We evaluated the performances of four sets of predictors: (i) AFP measurements only; (ii) clinical covariates selected by LASSO; (iii) metabolites selected by LASSO; and (iv) the combination of (ii) and (iii). Receiver operating characteristics (ROC) curves and 95% confidence interval (CI) of area under the ROC curve (AUC) calculated on the basis of leave-one-out cross-validation were used for performance evaluation.
Correlation analysis
Pearson correlation coefficients were calculated to investigate associations between the LASSO-selected metabolites. Separate correlation graphs were obtained for the HCC and cirrhotic groups by using the R corrplot package. The P values were adjusted for multiple comparison by the Benjamini & Hochberg procedure (12). The significance cutoff was set to be 0.05. Associations between LASSO-selected metabolites and a subset of clinical covariates were also investigated excluding 17 patients who had missing values for the clinical covariates.
Metabolites ID verification by standards
Identities of the majority of the metabolites selected by LASSO were confirmed by analysis of authentic compounds purchased from Sigma-Aldrich: l-valine (PHR1172), glycine (G7403), d-, l-isoleucine (298689), creatinine (C4255), l-pyroglutamic acid (83160)/[l-glutamic acid (95436)], alpha-d-glucosamine 1-phospate (G9753), tagatose (T2751) [sorbose (S4887)], linoleic acid (L1376), lauric acid (61609). Individual 0.25 mg/mL stock standard solutions were prepared in the appropriate solvent and stored at −20 °C until the analysis. Working standard solutions, at the concentration of 1.25 μg/mL, were prepared by appropriate dilution of the stock standard solutions in acetonitrile, isopropanol, and water (3:3:2). Standards were then concentrated to dryness and derivatized following the same procedure applied for plasma metabolites as described in the “Metabolite extraction” section. Each standard was analyzed by both GC-qMS and GC-TOF-MS platform, following the same GC and MS methods as described in Ranjbar and colleagues. (10). Acquired spectra of the individual standards were cross matched with the corresponding ones extracted from the analysis of the plasma samples.
Network and pathway analysis
To investigate the association among LASSO-selected metabolites, group-specific networks were built for HCC cases and cirrhotic controls through a Gaussian graphical model (GGM) and graphical LASSO algorithm implemented in the R Glasso package (13). In a GGM network, the connection between two nodes indicates a conditional independence between them given all the others. The sparsity parameter was tuned on the basis of the result of a 10-fold cross-validation applying the one standard error rule (14). The shared and group-specific connections between the HCC and cirrhotic GGM networks were also investigated. Further evaluation on the shared connections between two GGM networks was conducted by recovering the metabolites that are not detected in our experiment but are reported in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to have interaction with a pair of nodes in the GGM networks. This was accomplished by using the Matlab MetaboNetworks toolbox to discover the shortest path between each metabolite pairs from the KEGG database (15). To get a deeper understanding of the biological relation between the metabolites we detected in our study and those derived from KEGG in HCC and cirrhotic patients, we performed pathway enrichment analysis using MetaboAnalyst 3.0 (16).
Results
Metabolites selected by LASSO regression
LASSO regression selected 11 metabolites whose expression levels jointly differentiate HCC cases from cirrhotic controls. Table 2 presents statistical results for these variables based on multivariable analysis and univariate logistic regression (P values and FDR adjusted P values) along with fold changes (average and median values calculated on the raw intensities after correcting the batch effect). Their fold change ranged between +2.48 (alpha-d-glucosamine 1-phosphate) and −2.18 (tagatose). Figure 1 depicts their dot-plots. The selected metabolites include amino acids and their derivatives (valine, serine, glycine, isoleucine, creatinine and pyroglutamic acid/glutamic acid), sugars, and their derivatives (furanose sugar and alpha-d-glucosamine 1-phospate), fatty acids (linoleic acid and lauric acid), and one inorganic acid (phosphoric acid). Among the metabolites selected by LASSO, those found to be significant in the multinomial logistic regression model as discriminating stage I and II HCC cases from cirrhotic controls are indicated in Table 2. While we were able to confirm with high confidence (match value >850) the identity of nine of the selected metabolites by comparing their fragmentation patterns with the ones from commercial and/or in-house libraries, we were unable to determine with certainty the identity of the metabolite belonging to the class of furanose sugars. However, based on the similarity with the RI of the standard, we determined tagatose as the likely identification. Also we could not distinguish between pyroglutamic acid and glutamic acid, based on our data. Although in the following sections of the article, both names (pyroglutamic and glutamic acid) have been kept as the identification of the selected metabolite, only glutamic acid was chosen for investigation by literature search and pathway analysis. This is because pyroglutamic acid is most likely a product of glutamate cyclization during the chemical derivatization process (17). Higher levels of valine, serine, isoleucine, alpha-d-glucosamine 1-phosphate, and linoleic acid were found in HCC cases, while glycine, creatinine, glutamic acid, tagatose, lauric acid, and phosphoric acid were found elevated in cirrhotic controls.
. | . | Fold change raw . | Multivariable analysis . | Univariate logistic regression . | |
---|---|---|---|---|---|
. | Metabolites . | Mean . | Median . | Coefficient (P values) . | Coefficient (P values, adj. P valuesa) . |
Amino acids and derivatives | Valineb,c | ↑ +1.42 | ↑ +1.46 | 1.249 (0.036) | 1.016 (0.007–0.091) |
Serine | ↑ +1.23 | ↑ +1.13 | 0.958 (0.042) | 0.428 (0.152–0.381) | |
Glycineb | ↓ −1.31 | ↓ −1.21 | −1.821 (0.073) | −2.117 (0.002–0.091) | |
Isoleucineb,c | ↑ +1.29 | ↑ +1.28 | 1.251 (0.138) | 1.301 (0.021–0.168) | |
Creatinine | ↓ −1.58 | ↓ −1.53 | −0.782 (0.138) | −0.508 (0.086–0.330) | |
Pyroglutamic acid/glutamic acidb,c | ↑ −1.24 | ↓ −1.19 | −2.101 (0.079) | −2.101 (0.006–0.0914) | |
Sugars and alcohols | Alpha-d-glucosamine 1-phosphate | ↑+1.49 | ↑ +2.48 | 0.521 (0.012) | 0.146 (0.160–0.381) |
Tagatose (furanose sugar)b,c | ↓ −2.18 | ↓ −1.97 | 0.092 (0.633) | −0.359 (0.011–0.102) | |
Fatty acids | Linoleic acidb | ↑ +1.42 | ↑ +1.77 | 1.808 (0.007) | 0.863 (0.005–0.091) |
Lauric acid | ↓ −1.31 | ↓ −1.35 | −1.344 (0.024) | −0.543 (0.123–0.355) | |
Inorganic acid | Phosphoric acid | ↓ −1.03 | ↓−1.01 | −1.530 (0.140) | −0.331 (0.514–0.685) |
Clinical covariates | |||||
AFP | ↑ +1.99 | ↑ +2.53 | 0.405 (0.032) | 0.547 (0.003–0.065) | |
Child–Pugh score | — | — | −0.287 (0.142) | −0.469 (0.005–0.065) | |
Etiology (alcohol vs. HCV) | — | — | −1.725 (0.168) | −2.287 (0.042 0.283) | |
Etiology (HBV vs. HCV) | — | — | 16.929 (0.993) | 17.358 (0.994–0.994) | |
Etiology (NAFLD vs. HCV) | — | — | −0.378 (0.827) | −0.901 (0.482–0.765) | |
Etiology (Other vs. HCV) | — | — | −0.851 (0.370) | −1.055 (0.179–0.479) |
. | . | Fold change raw . | Multivariable analysis . | Univariate logistic regression . | |
---|---|---|---|---|---|
. | Metabolites . | Mean . | Median . | Coefficient (P values) . | Coefficient (P values, adj. P valuesa) . |
Amino acids and derivatives | Valineb,c | ↑ +1.42 | ↑ +1.46 | 1.249 (0.036) | 1.016 (0.007–0.091) |
Serine | ↑ +1.23 | ↑ +1.13 | 0.958 (0.042) | 0.428 (0.152–0.381) | |
Glycineb | ↓ −1.31 | ↓ −1.21 | −1.821 (0.073) | −2.117 (0.002–0.091) | |
Isoleucineb,c | ↑ +1.29 | ↑ +1.28 | 1.251 (0.138) | 1.301 (0.021–0.168) | |
Creatinine | ↓ −1.58 | ↓ −1.53 | −0.782 (0.138) | −0.508 (0.086–0.330) | |
Pyroglutamic acid/glutamic acidb,c | ↑ −1.24 | ↓ −1.19 | −2.101 (0.079) | −2.101 (0.006–0.0914) | |
Sugars and alcohols | Alpha-d-glucosamine 1-phosphate | ↑+1.49 | ↑ +2.48 | 0.521 (0.012) | 0.146 (0.160–0.381) |
Tagatose (furanose sugar)b,c | ↓ −2.18 | ↓ −1.97 | 0.092 (0.633) | −0.359 (0.011–0.102) | |
Fatty acids | Linoleic acidb | ↑ +1.42 | ↑ +1.77 | 1.808 (0.007) | 0.863 (0.005–0.091) |
Lauric acid | ↓ −1.31 | ↓ −1.35 | −1.344 (0.024) | −0.543 (0.123–0.355) | |
Inorganic acid | Phosphoric acid | ↓ −1.03 | ↓−1.01 | −1.530 (0.140) | −0.331 (0.514–0.685) |
Clinical covariates | |||||
AFP | ↑ +1.99 | ↑ +2.53 | 0.405 (0.032) | 0.547 (0.003–0.065) | |
Child–Pugh score | — | — | −0.287 (0.142) | −0.469 (0.005–0.065) | |
Etiology (alcohol vs. HCV) | — | — | −1.725 (0.168) | −2.287 (0.042 0.283) | |
Etiology (HBV vs. HCV) | — | — | 16.929 (0.993) | 17.358 (0.994–0.994) | |
Etiology (NAFLD vs. HCV) | — | — | −0.378 (0.827) | −0.901 (0.482–0.765) | |
Etiology (Other vs. HCV) | — | — | −0.851 (0.370) | −1.055 (0.179–0.479) |
aMultiple testing adjusted P values.
bMetabolites found to be significant in the multinomial logistic regression model as discriminating HCC – Stage I and II versus cirrhotic controls.
cMetabolites previously reported in our GC-MS study on an Egyptian cohort (10).
Clinical covariates selected by LASSO regression
LASSO regression analysis, applied on the clinical variables selected AFP (dot-plot in Fig. 1), Child–Pugh score, and etiologic factors comprised of alcohol, viral infection (HBV, HCV), non-alcoholic fatty liver disease (NAFLD), and other less frequent etiologies including autoimmune, primary biliary cirrhosis (PBC), primary sclerosing cholangitis (PSC), and cryptogenic. The results of multivariable analysis and univariate logistic regression of these clinical covariates are shown in Table 2. Although we did not anticipate the clinical covariates age, sex, and ethnicity to be selected by the LASSO model, since in our study they were matched between HCC cases and cirrhotic controls, we included them anyway into the analysis and, as expected, they did not show up to be significant.
Performance evaluation of predictors
Through a leave-one-out cross-validation, we evaluated the performances of the predictors selected by LASSO in terms of their ability to distinguish cirrhotic controls from early-stage HCC by excluding the group of patients with HCC stage III (n = 3). Figures 2A–C present box plots for AUC values and the corresponding 95% CI calculated on the basis of the training set during leave-one-out cross-validation of logistic regression and SVM models. Figures 2D and E depict ROC curves obtained while testing the logistic regression and SVM models during the leave-one-out cross-validation. As shown in these figures, the logistic regression model built by the LASSO-selected metabolites (AUC = 0.808) and clinical covariates (AUC = 0.788) led to improved performance compared with AFP (AUC = 0.723). Although the logistic regression model built by combining the LASSO-selected metabolites with clinical covariates in a panel (AUC = 0.733) did not perform well, the SVM built using these predictors outperformed (AUC = 0.857) the remaining three sets of predictors: LASSO-selected metabolites (AUC = 0.805), LASSO-selected clinical covariates (AUC = 0.786), and AFP (AUC = 0.712).
Correlation analysis
Pearson correlation among the panel of metabolites selected from LASSO showed a strong relation between amino acids, fatty acids, alpha-d-glucosamine-1-phosphate and pyroglutamic acid/glutamic acid in HCC patients (Supplementary Fig. S1). In particular, creatinine is strongly correlated with pyroglutamic acid/glutamic acid, alpha-d-glucosamine 1-phosphate, and lauric acid. Lauric acid is also positively correlated with pyroglutamic acid/glutamic acid, alpha-d-glucosamine 1-phosphate, and linoleic acid. On the other hand, in cirrhotic controls (Supplementary Fig. S2), phosphoric acid shows positive correlation with linoleic acid which presents also a moderate and strong negative correlation with a furanose sugar. Valine shows a common moderate and strong positive correlation with isoleucine in both HCC cases and cirrhotic controls. In addition to the correlation among the LASSO-selected metabolites, we investigated their relationship with two of the three clinical covariates (AFP, Child–Pugh score) selected by LASSO. In this panel, AFP was negatively correlated with glycine, creatinine, and Child–Pugh score in cirrhotic controls. The Child–Pugh score presented a negative correlation with isoleucine in both groups of patients and with creatinine in HCC cases only.
Network and pathway analysis
Figure 3 shows GGM networks built for HCC and cirrhotic controls separately, along with the merged one. As shown in the merged graph (Fig. 3, merged panel), there are four connected pairs, composed of five metabolites with statistically significant differences between HCC and cirrhotic groups (darker nodes). The five metabolites are alpha-d-glucosamine 1-phosphate, valine, serine, lauric acid, and linoleic acid. Among them, alpha-d-glucosamine 1-phosphate serves as a hub metabolite connected to all the other four. Further evaluation of these four metabolites by searching for the shortest path between each metabolite pair against the KEGG database (15) revealed 22 metabolites connecting the four metabolites as shown in Supplementary Fig. S3. Metabolite names and KEGG IDs for the original and recovered analytes are listed in Supplementary Table S2. Pathway enrichment analysis, using MetaboAnalyst 3.0, based on the original and recovered metabolites, derived from the GGM network analysis (Supplementary Fig. S3), showed the involvement of the selected metabolites in nine specific pathways common to both HCC and cirrhotic groups (Supplementary Table S3).
Discussion
In this study, we conducted targeted analysis of metabolites in plasma samples from HCC cases and patients with liver cirrhosis. LASSO regression analysis of the metabolomic data selected eleven metabolites and three clinical covariates including AFP. Combined by SVM in a panel, these predictors showed improved performance in disease classification, compared with AFP only (Fig. 2B). If successfully validated, the panel can potentially improve the ability to detect and monitor HCC in high risk population of cirrhotic patients.
Among the LASSO-selected 11 metabolites, four (valine, isoleucine, glutamic acid, and the furanose sugar) were also found statistically significant in a GC-MS–based metabolomic analysis, we conducted previously using plasma samples from HCC cases and patients with liver cirrhosis recruited in Tanta, Egypt (10). Of the three branched-chain amino acids (BCAA), valine and isoleucine were elevated in HCC versus cirrhosis in both the United States and Egyptian cohorts. Although it was not statistically significant, leucine too showed increased level in HCC versus cirrhosis, consistent with our previous findings (10). BCAAs have been reported to have a crucial role in cancer by regulating the anabolic process involving protein synthesis and degradation, needs that are shared by both tumor and normal cells (18). The severe muscle wasting syndrome experienced by many patients with cancer has motivated the use of BCAA supplements, as already extensively used in the athletic field for performance improvement and muscle mass. Therefore, the use of BCAAs as biomarkers is challenging due to the competing energetic and proliferative demands in both healthy and disease states (18–20). However, high levels of BCAAs in HCC samples could be due to their potential tumorigenic effect in liver and may be a significant component of diagnostic testing panels for monitoring the risk of cancer (19). While we found in this study reduced level of glutamic acid in HCC versus cirrhosis, we observed increased level in our previous study involving the Egyptian cohort (10). In another plasma metabolomics study conducted on patients with liver diseases (21), the level of glutamic acid was found decreased in all three types of liver disease (hepatitis, cirrhosis, HCC) when compared with healthy controls. According to the authors, this remarkable reduction can be explained by the altered activities and ratio across the three groups of patients of two monitored transaminases. Among the LASSO-selected sugars, tagatose appears to be downregulated in HCC similarly to sorbose, another furanose sugar, identified in our previous study using the Egyptian cohort (10). To investigate their nature and contribution in promoting hypoxia-inducible factors prevalent in low oxygen environments, as in solid tumors like HCC (17), the use of complementary techniques aimed at discriminating sugar isoforms will be necessary.
The selection by LASSO, of multiple clinical variables, in addition to AFP, seems to be in agreement with several epidemiologic and clinical studies that have shown the increased sensitivity of early detection of HCC in clinical practice when incorporating longitudinal data or adjusting for patient characteristics in addition to the conventional AFP assay (22).
The result of our correlation analysis shows that valine has a moderate and strong positive correlation with isoleucine in both HCC cases and cirrhotic controls, respectively, suggesting its connection not only to cancer but other liver diseases such as cirrhosis as reported previously (18). Also, the Child–Pugh score, a prognostic indicator of liver diseases and necessity of liver transplantation, presented a negative correlation with isoleucine in both groups of patients. This correlation has been previously reported in patients with liver diseases, where the muscle and blood amino acids metabolism were investigated (23).
The result of the pathways enrichment analysis (Supplementary Table S3; Supplementary Fig. S3) reveals the hepatic metabolome interchange between lipids and water-soluble metabolites crucial for liver energy production and consumption (24), therefore essential for aberrant metabolic reprogramming happening in cancer cells (25).
In summary, the combination of metabolites with clinical covariates, including AFP, has led to better area under the ROC curve in distinguishing early HCC cases in patients with liver cirrhosis when compared with the results obtained by using AFP. Previous HCC-related metabolomics studies, conducted using complementary metabolomics platforms and multivariate analysis, including a GC-MS study conducted on an Egyptian cohort, revealed similar metabolomics findings to the ones reported in this paper. Due to the small sample size used in this study, replication of these findings through a larger cohort including samples that represent diverse populations is desired. Following appropriate validation, the metabolites discovered in this study could contribute to better understanding of the development of HCC and allow early detection of HCC in patients with liver cirrhosis. Most of the clinical covariates selected by LASSO are commonly reported as HCC risk factors. Thus, following validation of the metabolites discovered in this study, their combination with the selected clinical covariates in a panel could contribute to better understanding of the development of HCC and to improving our ability to detect early-stage HCC in patients with liver cirrhosis.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: C. Di Poto, A. Ferrarini, R.S. Varghese, K. Shetty, H.W. Ressom
Development of methodology: C. Di Poto, A. Ferrarini, Y. Luo
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A. Ferrarini, R.S. Varghese, Y. Luo, C.S. Desai
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C. Di Poto, A. Ferrarini, Y. Zhao, R.S. Varghese, C. Tu, Y. Zuo, M. Wang, M.R. Nezami Ranjbar, C. Zhang, M.G. Tadesse
Writing, review, and/or revision of the manuscript: C. Di Poto, A. Ferrarini, Y. Zhao, R.S. Varghese, Y. Zuo, C.S. Desai, K. Shetty, M.G. Tadesse, H.W. Ressom
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C. Di Poto, A. Ferrarini, K. Shetty
Study supervision: C. Di Poto, A. Ferrarini, H.W. Ressom
Acknowledgments
The authors acknowledge Dr. Tsung-Heng Tsai for providing constructive comments.
Grant Support
This work was supported by U01CA185188 (awarded to H.W. Ressom).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.