Abstract
Hepatocellular carcinoma (HCC) undergoes a stepwise progression from liver cirrhosis to low-grade dysplastic nodule (LGDN), high-grade dysplastic nodule (HGDN), early HCC (eHCC), and progressed HCC (pHCC). Here, we profiled multilayered genomic, epigenomic, and transcriptomic aberrations in the stepwise hepatocarcinogenesis. Initial DNA methylation was observed in eHCC (e.g., DKK3, SALL3, and SOX1) while more extensive methylation was observed in pHCC. In addition, eHCCs showed an initial loss of DNA copy numbers of tumor suppressor genes in the 4q and 13q regions, thereby conferring survival benefits to cancer cells. Transcriptome analysis revealed that HGDNs expressed endoplasmic reticulum (ER) stress–related genes, while eHCC started to express oncogenes. Furthermore, integrative analysis indicated that expression of the serine peptidase inhibitor, Kazal type 1 (SPINK1), played a pivotal role in eHCC development. Significant demethylation of SPINK1 was observed in eHCC compared to HGDN. The study also demonstrated that ER stress may induce SPINK1 demethylation and expression in liver cancer cells. In conclusion, these results reveal the dynamics of multiomic aberrations during malignant conversion of liver cancer, thus providing novel pathobiological insights into hepatocarcinogenesis.
Multiomics profiling and integrative analyses of stepwise hepatocarcinogenesis reveal novel mechanistic and clinical insights into hepatocarcinogenesis.
Introduction
Hepatocellular carcinoma (HCC) classically develops through a multistep process involving a series of pathologic states: liver cirrhotic lesion (LC), dysplastic nodule (DN), early HCC (eHCC), and progressed HCC (pHCC). Liver injury induced by viral hepatitis or other etiologic factors produces a chronic inflammatory milieu leading to the development of LCs. LCs subsequently develop into DNs, which are further classified as low-grade or high-grade DNs (LGDN or HGDN, respectively) based on the presence of cytological and architectural atypia. Dysplastic lesions eventually progress to eHCC and pHCC. Molecular alterations accompanying this stepwise pathological sequence have been elucidated to determine definitive markers for eHCC. At present, key cancer-associated pathways, such as Notch, Toll-like receptor, MYC, TGFβ, WNT, and epithelial–mesenchymal transition (EMT)–related signaling, have been demonstrated as active during hepatocarcinogenesis (1–4). In addition, high-throughput genome-wide studies have revealed genomic aberrations that occur during stepwise HCC development, including DNA methylation (5, 6), DNA copy number changes, mutations (e.g., TP53, CTNNB1, or TERT promoter; refs. 7–9), and transcriptional deregulation (10). Recently, multiomic studies, such as those involving The Cancer Genome Atlas (TCGA), showed have enabled an integrated and systematic view of the genomic aberrations associated with liver cancer, revealing novel biomarkers, as well as therapeutic targets (11–13).
Recently, multiomic studies shifted their focus toward precancerous lesions, resulting in the creation of the Pre-Cancer Atlas (PCA), an integrated collection of molecular, structural, and functional maps that provide information regarding tumor initiation and progression (14). The PCA is expected to delineate mechanisms underlying tumor development or progression to enhance early detection, risk prediction, and the development of new therapeutic strategies.
In this study, we generated multiomic profiling data that demonstrated a molecular landscape of the multilayered aberrations that occur during stepwise hepatocarcinogenesis from precancerous lesions to malignant development of eHCC and pHCC. By performing integrative analyses, we identified early events that may possibly promote malignant conversion of precancerous lesions, thereby providing novel mechanistic insights into HCC development. These data may facilitate upcoming studies focused on discovering novel biomarkers or therapeutic targets for HCC.
Materials and Methods
Patients and tissue specimens
A total of 131 tissue specimens, including those of LC (n = 30), HGDN (n = 28), early HCC (eHCC, n = 30), and progressed HCC (pHCC, n = 43), resected from 76 patients was obtained (YSHCC, HCC cohorts from Yonsei University, Seoul, Korea). All lesions were classified according to the criteria stipulated by the International Consensus Group for Hepatocellular Neoplasia (15). Freshly frozen specimens were obtained from the Liver Cancer Specimen Bank (part of the National Research Bank Program, Korea Science and Engineering Foundation, Ministry of Science and Technology), and subjected to RNA sequencing (RNA-seq) profiling. Tissues of LGDNs were not included in this study due to limited availability of frozen tissues. Clinicopathologic features of the 76 patients are summarized (Supplementary Table S1). This study was approved by the Institutional Review Board of Severance Hospital, Yonsei University College of Medicine (4-2014-0423), and the requirement for informed consent was waived.
For validation via IHC staining, a total of 171 liver specimens from 90 patients was examined, which included LC (n = 30), LGDN (n = 30), HGDN (n = 40), eHCC (n = 40), and pHCC (n = 41) specimens. Patients were 34 to 76 years in age (57.7 ± 8.19, mean ± SD) and consisted of 75 males and 15 females. Most etiologies were HBV (n = 74), and the remainder included HCV (n = 5), alcohol (n = 7), Nonalcoholic steatohepatitis (NASH, n = 1), occult HBV potential (n = 1), or of unknown etiology (n = 2). Clinicopathologic features of the 90 patients subjected to immunostaining are summarized in Supplementary Table S2.
RNA-seq profiling and variant profiling
RNA-seq profiling was performed in the YSHCC specimens (n = 131), including LC (n = 30), HGDN (n = 28), eHCC (n = 30), and pHCC (n = 43). Sequencing data were obtained using an Illumina HiSeq2000 for 100 bp paired-end reads with a coverage greater than 42 million reads per sample. Data processing for measuring RNA abundance and variant calling was performed using an R package “SEQprocess” (16). RNA abundance was estimated by using Tophat2-Cufflinks and the sequence variants were identified using a Genome Analysis Toolkit (GATK, https://software.broadinstitute.org/gatk). Wild- and mutant-type variants were determined with the cutoff of read counts greater than 8, respectively. Otherwise, the variants were designated as missing values.
Profiling of DNA methylation and DNA copy number aberration
DNA methylation profiling was performed in the subset of the YSHCC, including LC (n = 6), HGDN (n = 11), eHCC (n = 9), and pHCC (n = 6) using an Infinium Human Methylation 450K BeadChip. Data processing and analyses were performed using R/Bioconductor libraries. For DNA methylation profile, probe level β-values were imported using “RnBeads” library. DNA copy numbers were estimated from the DNA methylation data using “ChAMP.” Batch effect was corrected by “combat,” and DNA copy number segments were calculated by using a circular binary segmentation algorithm with default parameters.
Validation of public data
Publicly available DNA copy number and somatic mutation profiles of HCC were obtained from The Cancer Genome Atlas (TCGA-LIHC, https://cancergenome.nih.gov). The segmented DNA copy number values were mapped to gene level aberration. Gene lists with copy number gains and losses were obtained from a previous study by the International Cancer Genome Consortium (LICA-FR, https://dcc.icgc.org). Data were obtained from TCGA and GEO websites (accession no.; GSE89377, GSE6764, GSE25097, GSE14520, GSE44970, GSE60753, GSE73003, GSE4024, GSE87630, and GSE65373), to validate the expression, methylation, and survival analyses related to our finding. Gene sets obtained from literature or public data were used for data analyses (Supplementary Table S3).
Further details on the methods utilized for profiling of DNA methylation, DNA copy number, RNA-seq, variant calling, network analysis, and the molecular experiments including IHC, RT-PCR, Western blot analysis, gene expression constructs, lentiviral vector transfection, and reagents have been described (Supplementary Materials and Methods).
Data availability
Data from genomic profiles are available in the GEO database (http://www.ncbi.nlm.nih.gov/projects/geo) under accession number GSE99036.
Results
Stepwise aberration of DNA methylation in HCC
Tissue specimens of LC, HGDN, eHCC, and pHCC were diagnosed as described in the Materials and Methods. Multiomic profiles of DNA methylation (n = 32), DNA copy numbers (n = 32), and mRNA expression (n = 131) were generated, and analysis results were validated using tissue microarrays (n = 171) and public data from TCGA and GEO databases. The overall study design is summarized (Supplementary Fig. S1).
First, aberrations of DNA methylation during stepwise HCC progression in each group (LC, HGDN, eHCC, and pHCC, n = 32) were evaluated. Overall, no significant differences in global methylation status were observed between the groups (Supplementary Fig. S2A and S2B). To determine methylome changes during hepatocarcinogenesis, we identified 595 differentially methylated probes (DMP) for each group in comparison with the group preceding it (i.e., HGDN vs. LC, eHCC vs. HGDN, and pHCC vs. eHCC; asymptotic one-way Fisher–Pitman permutation test P < 0.001 and fold difference > 0.1). Excluding the DMPs identified in earlier stage groups, we identified the DMPs of each group for HGDN (DMPHGDN, n = 50), eHCC (DMPeHCC, n = 56), and pHCC (DMPpHCC, n = 489), respectively (Supplementary Table S4). The DMPs showed predominant hypermethylation (n = 555) rather than hypomethylation (n = 40) during HCC development (Fig. 1A), including previously known methylated genes in HCCs such as TBX4, CCNA1, PENK, WT1, and TRIM58 (5, 17). In addition, we identified methylation of DKK3, SOX1, and SALL3 in eHCC, with each of these events having been previously reported in pHCC samples (18–20). The DMPpHCC was significantly enriched with development-related genes [enrichment score (ES) = 6.42; P = 1.65 × 10−10], indicating the dysregulation of developmental processes during pHCC progression (Fig. 1B). Comparison of each group revealed that the DMPpHCC were markedly methylated and showed mid-range beta values (0.25–0.75), whereas the other DMPs were hypomethylated and showed lower beta values (Fig. 1C). Moreover, the DMPs were preferentially located at 5′ untranslated regions, indicating their functional significance (Supplementary Fig. S3, top). Most hypermethylated DMPs resided in CpG-island (CGI) loci, whereas this was not observed in hypomethylated DMPs (Supplementary Fig. S3, bottom).
Stepwise aberration of DNA methylation during HCC development. A, Heatmap showing DNA methylation levels of 595 DMPs. Eight top-ranked genes among the hypomethylated genes (n = 40) and the hypermethylated genes (n = 555) in eHCC or pHCC are shown. Previously known DNA methylated genes in HCCs are indicated (blue). B, A plot showing the functional enrichment scores of the genes mapped to DMPpHCC. Sphere size indicates the number of the genes in each gene set. C, Distribution of the average DMP beta-values in each group is shown. D, Average methylation levels in each group according to the genomic coordinates in relation to CGI are shown.
Stepwise aberration of DNA methylation during HCC development. A, Heatmap showing DNA methylation levels of 595 DMPs. Eight top-ranked genes among the hypomethylated genes (n = 40) and the hypermethylated genes (n = 555) in eHCC or pHCC are shown. Previously known DNA methylated genes in HCCs are indicated (blue). B, A plot showing the functional enrichment scores of the genes mapped to DMPpHCC. Sphere size indicates the number of the genes in each gene set. C, Distribution of the average DMP beta-values in each group is shown. D, Average methylation levels in each group according to the genomic coordinates in relation to CGI are shown.
DNA methylation at the flaking regions of CGIs has previously been known to play a critical role in cell differentiation and cancer development (21). Therefore, we explored methylation patterns in the flanking regions of CGIs, classified as Shore (within 2 kb of the CGI), Shelf (within 2–4 kb of the CGI), and Opensea regions (>4 kb from the CGI), and observed a distinctive pattern in the regional progression of DNA methylation among the groups (Fig. 1D). LC and HGDN exhibited lower DNA methylation levels in CGI regions, which increased markedly during eHCC and pHCC development. Notably, we found that the DNA methylation process initiated in the Shelf and Shore regions during HGDN development, with increased CGI region methylation during eHCC and pHCC development. These findings indicated that sequential DNA methylation initiated in CGI-flanking regions and extended into CGI regions during HCC development.
We then evaluated the consistency of our data with previous DMPs (i.e., DMP36 and DMP100 signatures) that were identified as being methylated in HCCs (mostly pHCC; GSE44970; n = 102; ref. 5). Although most DMP100 genes did not overlap with our DMPs, those genes did exhibit stepwise methylation during eHCC and pHCC development (Supplementary Fig. S4A, S4B). In addition, we validated stepwise methylation at CGI regions, although the methylation pattern at CGI-flanking regions was not discernible during stepwise progression of HCC, which might be due to the low-resolution of the data platform (27K probes; Supplementary Fig. S4C).
DNA copy number aberration during stepwise hepatocarcinogenesis
DNA copy numbers were obtained by inferring DNA methylation data (for details see Materials and Methods). We observed that the number of copy number aberration (CNA) segments, particularly the deleted segments, was increased markedly during eHCC and pHCC development (Supplementary Fig. S5A). Moreover, the sizes of the CNA segments became larger, implying a stepwise increase in genomic instability and aberration during hepatocarcinogenesis (Supplementary Fig. S5B).
Group comparison indicated that LC and HGDN exhibited no significant alteration of DNA copy numbers; however, eHCC samples showed genome-wide deletion of DNA copy numbers, and pHCC samples showed more profound and typical chromosomal losses and gains, as described previously (Fig. 2A; refs. 22, 23). To determine the stepwise alteration of CNAs more precisely, differential copy number aberrations (DCNAs) were estimated (one-way Fisher–Pitman permutation test P < 0.05 and fold difference > 0.05) between each group and their respective precedent group: HGDN versus LC (n = 182; DCNAHGDN), eHCC versus HGDN (n = 2,056; DCNAeHCC), and pHCC versus eHCC (n = 10,303; DCNApHCC). Notably, we found that eHCC exhibited prominent copy number losses at 4q and 13q as compared with HGDN (Fig. 2B). In addition, we observed that these regions harbored many tumor-suppressor genes (TSG), such as retinoblastoma 1 (RB1; Fig. 2C). In fact, oncogenes showed recurrent DNA copy number gains (e.g., MYC and CCND1), whereas TSG frequently lost their copy numbers (e.g., CDKN2A and PTEN; refs. 3, 24). To conduct a genome-wide evaluation of this hypothesis, we examined CNAs in the gene sets of oncogenes (n = 674) and TSGs (n = 1,088), and observed that oncogenes preferentially gained copy numbers (P = 0.001, Fisher exact test) while TSGs preferentially lost copy numbers with statistical significance (P = 0.005; Fig. 2D, left). This finding was validated in TCGA-HCC data (LIHC), which showed frequent DNA copy number gains in oncogenes (P = 0.058) and losses in TSGs (P = 3.65 × 10−8; Fig. 2D, right). Therefore, we suggest that the aberrations associated with DCNAeHCC are not random events, and instead may act as oncodrivers during malignant conversion. In contrast, DNA methylation profiles did not show preferential alterations of oncogenes or TSGs (Supplementary Fig. S6). These results imply that aberrations in DNA copy numbers more than DNA methylation may play a pivotal role during HCC progression.
DNA copy number aberration during stepwise hepatocarcinogenesis. A, Circos plots showing average CNA values according to the genomic coordinates in each group. B, Fold differences for the DCNAs for HGDN vs. LC (top), eHCC vs. DN (middle), and pHCC vs. eHCC (bottom) are plotted, and the chromosomal regional CNA gains (red lines) and losses (blue lines) are indicated, respectively. Early CNA alterations in eHCC at 4q and 13q are indicated (cyan lines). C, A heatmap shows the DCNAs according to the chromosomal locations. DCNA gains and losses from LICA-FR and TCGA-LIHC, and oncogenes and TSGs are indicated in sidebars (right). The 4q and 13q regions are indicated (yellow box), and oncogenes and TSGs in this region are shown in zoomed color bars (rightmost). D, Pie plots showing the proportion of the overlap between CNA genes and oncogenes (n = 674; top) or the TSGs (n = 1,088; bottom) in each group of YSHCC and TCGA-LIHC data, respectively. E–G, Correlations between CNAs enrichment scores (ESCNA) and the corresponding gene expression levels (ESEXP) of the DCNAeHCC (4q and 13q) genes are shown for YSHCC (n = 32; E), GSE65373 (n = 38; F), or TCGA-LIHC (n = 365; G), respectively. H, A Kaplan–Meier plot analysis of overall survival (OS) between the patients with gain (ESEXP > 0 and ESCNA > 0; n = 173) and loss (ESEXP < 0 and ESCNA < 0; n = 114) of the DCNAeHCC (4q and 13q) genes is shown for TCGA-LIHC data (bottom right).
DNA copy number aberration during stepwise hepatocarcinogenesis. A, Circos plots showing average CNA values according to the genomic coordinates in each group. B, Fold differences for the DCNAs for HGDN vs. LC (top), eHCC vs. DN (middle), and pHCC vs. eHCC (bottom) are plotted, and the chromosomal regional CNA gains (red lines) and losses (blue lines) are indicated, respectively. Early CNA alterations in eHCC at 4q and 13q are indicated (cyan lines). C, A heatmap shows the DCNAs according to the chromosomal locations. DCNA gains and losses from LICA-FR and TCGA-LIHC, and oncogenes and TSGs are indicated in sidebars (right). The 4q and 13q regions are indicated (yellow box), and oncogenes and TSGs in this region are shown in zoomed color bars (rightmost). D, Pie plots showing the proportion of the overlap between CNA genes and oncogenes (n = 674; top) or the TSGs (n = 1,088; bottom) in each group of YSHCC and TCGA-LIHC data, respectively. E–G, Correlations between CNAs enrichment scores (ESCNA) and the corresponding gene expression levels (ESEXP) of the DCNAeHCC (4q and 13q) genes are shown for YSHCC (n = 32; E), GSE65373 (n = 38; F), or TCGA-LIHC (n = 365; G), respectively. H, A Kaplan–Meier plot analysis of overall survival (OS) between the patients with gain (ESEXP > 0 and ESCNA > 0; n = 173) and loss (ESEXP < 0 and ESCNA < 0; n = 114) of the DCNAeHCC (4q and 13q) genes is shown for TCGA-LIHC data (bottom right).
To further evaluate the functional roles of DCNAeHCC at 4q and 13q, we calculated the ES values of DCNAeHCC genes at 4q and 13q (n = 327) in DNA copy numbers and mRNA expression levels, respectively. Previously, the genes showing DNA copy number–dependent transcriptional dysregulation have been shown to play functional driving roles in cancer progression (12, 25). As expected, the ES values of CNAs (ESCNA) were correlated with the ES values of the corresponding gene expression (ESEXP; r = 0.37, P = 0.03; Pearson correlation test), indicating that the expression of the 4q and 13q region was dependent to its DNA copy number aberration (Fig. 2E). This finding was validated in the two independent datasets of GSE65373 (n = 38; r = 0.51, P = 9.76 × 10−4) and TCGA-LIHC (n = 365; r = 0.73, P = 3.51 × 10−63; Fig. 2F and G).
Next, to evaluate the clinical relevance of the DCNAeHCC at 4q and 13q, we selected patients who exhibited high correlation between ESCNA and ESmRNA,, and stratified them into High- and Low-groups. The High group represented the patients who showed DCNAeHCC gains and transcriptional upregulation of their corresponding genes (ESEXP > 0 and ESCNA > 0; n = 173), whereas the Low group represented the patients who showed DCNAeHCC losses with suppression of corresponding genes (ESEXP < 0 and ESCNA < 0; n = 114). As 4q and 13q genes harbored many TSGs, we hypothesized that the High group may exhibit a better prognostic outcome compared that of the Low group. As expected, there was a clinical distinction between the groups, showing better clinical outcome of the High group compared to that of the Low group (HR = 0.32, P = 0.019, log-rank test; Fig. 2H). Taken together, these results suggest that CNA losses at 4q and 13q along with concomitant transcriptional dysregulation may play a potential driving role in eHCC development.
Somatic mutations were acquired during stepwise hepatocarcinogenesis
Next, we identified recurrent somatic nonsynonymous or nonsense mutations from the RNA-seq data (n = 1,824). No significant difference in the mutation frequencies were evident in group comparisons (Supplementary Fig. S7). Among the mutations, newly identified mutations in each group that were not present in the precedent group were designated as “differentially mutated genes” (DMUT) for each of HGDN (DMUTHGDN; n = 156), eHCC (DMUTeHCC; n = 34), and pHCC (DMUTpHCC; n = 11), respectively (Fig. 3A, for details see Materials and Methods). Next, we demonstrated the mutation status of the genes including mutations identified in oncogenes (e.g., ERBB2 and CTNNB1) and TSGs (e.g., RB1CC1 and TP53) among the DMUT genes, as well as the significant mutations identified in TCGA-LIHC data (n = 26), which may potentially play key roles in HCC development and progression (Fig. 3B). Genes harboring well-known trunk mutations, such as TP53 and CTNNB1, were detected in eHCC, as substantiated by the findings of previous studies. In addition, we identified a new trunk mutation in BAP1 (BRCA1-associated protein-1) in eHCC, which was previously suggested as a driver mutation in HCC (12). These results consistently indicated a potential driver role for trunk mutations in HCC development.
Mutation alterations during stepwise hepatocarcinogenesis. A, A diagram showing the number of DMUT in each group. B, A heatmap showing mutations including the oncogenes and TSGs in DMUT and the significantly mutated genes from TCGA-LIHC. Top, mutations at TERT promoter region are shown. C, Mutation frequencies in the driver mutation pathways are shown.
Mutation alterations during stepwise hepatocarcinogenesis. A, A diagram showing the number of DMUT in each group. B, A heatmap showing mutations including the oncogenes and TSGs in DMUT and the significantly mutated genes from TCGA-LIHC. Top, mutations at TERT promoter region are shown. C, Mutation frequencies in the driver mutation pathways are shown.
We also examined mutations in the promoter region of TERT, which have been observed to play key roles in hepatocarcinogenesis. Congruent with a previous study (26), mutation frequencies were increased along with the group for LC (2/30; 6.6%), HGDN (4/28; 14.2%), eHCC (5/28; 17.8%), and pHCC (12/43; 27.9%). However, the overall mutation frequency was much lower what was observed in the previous study (61% in eHCC and 64% in pHCC; ref. 26), which may be due to cohort differences between studies. In fact, while the majority of patients in our cohort were patients with HBV etiologic factor, the previous study cohort was mostly Caucasian and consists of etiologic factors other than HBV.
In addition, we examined the driver-mutation pathways, which have been shown to mutate frequently in liver cancers (Fig. 3C; ref. 27). HGDN showed recurrent mutations in the oxidative-stress pathway (4/28, 14%). Compared with precancerous tissues, eHCC and pHCC tissues showed enriched mutations in the genes associated with cancer-driver pathways, such as chromatin regulators, TP53/cell cycle, WNT, and Ras/extracellular signal–regulated kinase. Mutations in AKT/mTOR (mTOR) pathway were found only in pHCC, implying a late event. These results demonstrated dynamic and sequential accumulations of pathway mutations during hepatocarcinogenesis.
Transcriptome alterations during stepwise hepatocarcinogenesis
Epigenomic and genomic alterations may ultimately reprogram the transcriptomic architecture of cancers. To evaluate aberrations at the transcriptome-level, we constructed a pooled gene-expression profile (n = 293), including YSHCC (n = 131) and public datasets of multistep hepatocarcinogenesis [i.e., GSE89377 (n = 87) and GSE6764 (n = 75)]. Pairwise comparison of each group identified differentially expressed genes (DEG) for LC (n = 2,006), DN (n = 874), eHCC (n = 925), and pHCC (n = 1,736), respectively (permuted t test; P < 0.05 and fold difference > 0.3; Fig. 4A). Gene ontology analysis revealed stepwise functional alterations among groups (Supplementary Fig. S8A), with the LC group exhibiting enriched expression of inflammation-related genes (e.g., those associated with type I IFN signaling; ES = 15.05; P = 1.81 × 10−18), whereas the HGDN group exhibited enriched expression of endoplasmic reticulum (ER) stress–related genes (ES = 5.12; P = 9.85 × 10−10). Furthermore, eHCC and pHCC groups exhibited stepwise expression of cell cycle–related genes (eHCC: ES = 4.60 and P = 4.52 × 10−12; pHCC: ES = 33.06 and P = 8.57 × 10−66).
Transcriptomic alterations during stepwise hepatocarcinogenesis. A, A heatmap showing the expression of the DEGs among the groups (top). Enrichment scores for each of the gene signatures of immune, TGFβ, ER stress, oncogene, and TSG are calculated in each sample. Active and exhausted immune types were determined by applying the NTP algorithm based on the expression of “Activated stroma” and “Normal stroma” signatures (FDR < 0.05). TGFβ activity is estimated with early- and late- TGFβ signatures, respectively, and the expression transition from early- to late-TGFβ signatures (ESlate-TGFB - ESearly-TGFB) is plotted. Onco-activity is calculated as ESoncogenes - ESTSGs. B, A bar plot shows the proportion of immunotypes in each group. C, Gene set enrichment analysis for the ER-stress genes between HGDN and LC groups (left) and oncogenes between eHCC and HGDN groups (right) are shown.
Transcriptomic alterations during stepwise hepatocarcinogenesis. A, A heatmap showing the expression of the DEGs among the groups (top). Enrichment scores for each of the gene signatures of immune, TGFβ, ER stress, oncogene, and TSG are calculated in each sample. Active and exhausted immune types were determined by applying the NTP algorithm based on the expression of “Activated stroma” and “Normal stroma” signatures (FDR < 0.05). TGFβ activity is estimated with early- and late- TGFβ signatures, respectively, and the expression transition from early- to late-TGFβ signatures (ESlate-TGFB - ESearly-TGFB) is plotted. Onco-activity is calculated as ESoncogenes - ESTSGs. B, A bar plot shows the proportion of immunotypes in each group. C, Gene set enrichment analysis for the ER-stress genes between HGDN and LC groups (left) and oncogenes between eHCC and HGDN groups (right) are shown.
To further characterize the expression of immune-related genes, we examined previously defined immune-related gene signatures, such as “HCC immune” (n = 112), “Senescence” (n = 50), “Senescence-associated secretory phenotype” (SASP, n = 60), and “Cytokine pathway” (n = 187; Supplementary Table S3). We found that the LC group was immune-active, showing the highest expression of immune-related and senescence-related genes, which may provide a microenvironmental milieu for carcinogenesis (Fig. 4A, top; Supplementary Fig. S8B; ref. 28). In addition, we analyzed the immunotypes, “active-immune” and “exhausted-immune,” which were recently reported as playing distinct roles in HCC progression (29). Prediction of the immunotypes in each sample by using Nearest Template Prediction (NTP) algorithm (30) revealed that LC predominantly expressed the “active-immune” type (19/55; 34.54%), whereas eHCC and pHCC expressed the “exhausted-immune” type (17/53 of eHCC, 32.07%; 29/95 of pHCC, 30.52%; Fig. 4B). This indicated that eHCC undergoes a transition in immune types, which may facilitate immune evasion of tumor cells. With respect to this finding, TGFβ signaling regulates tumor–stroma interactions and suppresses host immune response by inducing T-cell exhaustion (31). Moreover, TGFβ acts as a tumor suppressor at the early stages of tumor development expressing the early TGFβ signature, whereas, at later stages, TGFβ possesses tumor-promoting potential expressing the late TGFβ signature (32, 33). Congruent with immunotype conversion, eHCCs showed a transition from early to late TGFβ-related signaling, as demonstrated by the transition scores associated with TGFβ signaling (i.e., ESlate-TGFB − ESearly-TGFB; Fig. 4A, middle). Such transition in TGFβ activity during hepatocarcinogenesis may indicate the induction of tumor-promoting activity, as well as exhausted immune activity.
Gene set enrichment analysis validated the stepwise expression of ER stress–related genes and oncogenes. HGDN exhibited prominent expression of ER stress–related genes associated with “Response to ER stress” (n = 233) or the “Unfolded-protein response” (n = 113; Fig. 4A, middle). Subsequently, eHCC showed enriched expression of oncogenes and counter-balanced suppression of TSGs. Calculation of onco-activity based on the differentially enriched activation of oncogenes and suppression of TSGs (i.e., ESoncogene − ESTSG) revealed that onco-activity transitioned from tumor-suppressive (negative onco-activity) to oncogenic (positive onco-activity) activity during eHCC development. A groupwise comparison of gene set enrichment analysis confirmed the differential expression of ER stress–related genes during DN development (ES = 0.45, P = 0.01; Fig. 4C, left) and oncogene activation during eHCC development (ES = 0.30, P = 0.05; Fig. 4C, right). These results suggested that stepwise expression of immune-related and ER stress–related genes in LC and HGDN might provide a microenvironmental milieu that is favorable for eHCC development.
Serine protease inhibitor Kazal type 1 is a potential key driver of eHCC development
To delineate key regulators modulating malignant conversion, we constructed a genetic network of DEGs in HGDN (DEGHGDN; n = 143) and eHCC (DEGeHCC; n = 277) to determine their functional enrichment of ER stress–related genes and oncogenes. Our analysis revealed 5 genes [i.e., serine protease inhibitor Kazal type 1 (SPINK1), RRAGD, CLDN15, CAP2, and ARK1C3] that potentially bridge DEGHGDN and DEGeHCC subsets (Fig. 5A). Among them, we focused in SPINK1, CAP2, and RRAGD, which were differentially expressed during multistep hepatocarcinogenesis (permuted t test, P < 0.05 and fold difference > 0.3; Supplementary Fig. S9A). Stepwise expression of these genes was validated in the pooled data (n = 293) and each dataset of YSHCC (n = 131), GSE89377 (n = 87), and GSE6764 (n = 75), respectively (Fig. 5B; Supplementary Fig. S9B). Moreover, these genes showed tumor-specific expression as compared with the levels in nontumor tissues in TCGA-LIHC (n = 421), GSE14520 (n = 486), GSE25097 (n = 557), and GSE87630 (n = 94; Supplementary Fig. S9C). Our findings suggested that SPINK1, CAP2, and RRAGD might play potential oncodriver roles during hepatocarcinogenesis.
SPINK1 expression is a putative marker of eHCC development. A, Genetic networks of DEGHGDN and DEGeHCC genes were constructed with the interactions of physical, genetic, and pathways by using GeneMania software. ER stress-related genes in DEGHGDN and the oncogenes in DEGeHCC are indicated, and their connectors are shown. B, Top, a heatmap showing the stepwise expression of SPINK1, CAP2, and RRAGD during hepatocarcinogenesis. Bottom, boxplots showing the distribution of expression levels of SPINK1, CAP2, and RRAGD in the pooled data, YSHCC, GSE89377, and GSE6764, respectively. C, Kaplan–Meier plot analyses of overall survival (OS) and recurrence-free survival (RFS) for the patient groups stratified by above or below the average expression levels of SPINK1 in each HCC cohort of GSE14520 and TCGA-LIHC, respectively. Follow-up time for OS or RFS is truncated to 5 years. D, Left, IHC staining of hematoxylin and eosin (H&E; top) and SPINK1 (bottom) in LC, LGDN, HGDN, eHCC, and pHCC specimens is shown. Right, a bar plot shows the semiquantitative positivity of SPINK1 expression in LC, LGDN, HGDN, eHCC, and pHCC specimens.
SPINK1 expression is a putative marker of eHCC development. A, Genetic networks of DEGHGDN and DEGeHCC genes were constructed with the interactions of physical, genetic, and pathways by using GeneMania software. ER stress-related genes in DEGHGDN and the oncogenes in DEGeHCC are indicated, and their connectors are shown. B, Top, a heatmap showing the stepwise expression of SPINK1, CAP2, and RRAGD during hepatocarcinogenesis. Bottom, boxplots showing the distribution of expression levels of SPINK1, CAP2, and RRAGD in the pooled data, YSHCC, GSE89377, and GSE6764, respectively. C, Kaplan–Meier plot analyses of overall survival (OS) and recurrence-free survival (RFS) for the patient groups stratified by above or below the average expression levels of SPINK1 in each HCC cohort of GSE14520 and TCGA-LIHC, respectively. Follow-up time for OS or RFS is truncated to 5 years. D, Left, IHC staining of hematoxylin and eosin (H&E; top) and SPINK1 (bottom) in LC, LGDN, HGDN, eHCC, and pHCC specimens is shown. Right, a bar plot shows the semiquantitative positivity of SPINK1 expression in LC, LGDN, HGDN, eHCC, and pHCC specimens.
We then evaluated functional and clinical relevance of the candidate genes. SPINK1 expression was consistently correlated with clinical outcomes of overall survival or recurrence-free survival in independent studies (GSE14520 and TCGA-LIHC; Fig. 5C). Moreover, univariate and multivariate analyses of TCGA data also revealed the prognostic significance of SPINK1 expression in TCGA-LIHC data [HR = 1.52; 95% confidence interval (CI), 1.07–2.15; P = 0.018, univariate analysis; HR = 1.64; 95% CI, 1.11–2.43; P = 0.013, multivariate analysis; Supplementary Table S5]. Cell culture experiments also indicated that the overexpression of SPINK1 promoted proliferation as well as invasion activities in liver cancer cells (Supplementary Fig. S10A–S10C). Accordingly, we prioritized SPINK1 as a potential driver in the following analyses.
Clinical utility of SPINK1 was also evaluated via IHC staining of the 171 tissues, which included LC (n = 30), LGDN (n = 30), HGDN (n = 40), eHCC (n = 40), and pHCC (n = 41) specimens (Supplementary Table S2). We observed that SPINK1 protein levels gradually increased in eHCC (6/40; 15%) and pHCC (22/41; 54%) but were not detectable in precancerous lesions (Fig. 5D). This finding suggested that SPINK1 exhibits potential clinical utility as a differential diagnostic marker for distinguishing eHCC from noncancerous lesions.
SPINK1 expression is regulated by DNA methylation
Previous integrative multiomic studies demonstrate that genes exhibiting transcriptional dysregulation and concomitant epigenomic or genomic aberrations play pivotal roles in cancer progression. Therefore, we considered genes that were inversely correlated with DNA methylation to be indicative of DNA methylation–dependent transcriptional dysregulation (METcor, n = 48; Z-transformed correlation coefficient, r < −1.96; one-way Fisher–Pitman permutation test, P < 0.05; Supplementary Fig. S11A). Interestingly, SPINK1 was identified again as a DNA methylation-dependent gene (Fig. 6A). In addition, stepwise demethylation of SPINK1 at exon 1 (cg04577715) during hepatocarcinogenesis was observed in the YSHCC and GSE44970 datasets, respectively (Fig. 6B), and HCC-specific demethylation of SPINK1 relative to its status in non-tumor tissues was confirmed in multiple data sets [i.e., GSE60753 (n = 143), GSE73003 (n = 40), SNUHCC (HCC data from Seoul National University, n = 64), and TCGA-LIHC (n = 427); Supplementary Fig. S11B]. Moreover, DNA methylation-dependent regulation of SPINK1 expression was validated by the inverse correlation seen between DNA methylation and SPINK1 expression in the pooled dataset (n = 467; r = −0.49, P < 0.001), which included TCGA-LIHC and GSE87630 (Fig. 6C). Furthermore, we confirmed these findings via cell culture experiments, wherein transfection of DNA methyltransferase 1 (DNMT1) into liver cancer cell lines significantly suppressed SPINK1 expression (Fig. 6D; Supplementary Fig. S12). In contrast, treatment of these cells with a demethylating agent (5-aza-deoxycytidine) markedly upregulated SPINK1 expression (Fig. 6E). These findings strongly indicated that SPINK1 expression was regulated by DNMT1-dependent DNA methylation.
SPINK1 expression is regulated by DNA hypomethylation. A, Venn diagram showing the overlap of SPINK1 between the DMPs and METcor genes. B, Stepwise methylation of SPINK1 during multistep hepatocarcinogenesis is shown in YSHCC and GSE44970, respectively. C, Inverse correlation of SPINK1 expression with its DNA methylation levels is shown in pooled HCC including YSHCC, LIHC-TCGA, and GSE87630. D, SPINK1 mRNA expression levels in the indicated cells with stable transfection of vector or DNMT1 were measured by quantitative RT-PCR. E, 5-aza-DC (5-aza-deoxycytidine, 10 μmol/L) was treated on the liver cancer cells (Huh7, Hep3B, and HepG2), and the SPINK1 mRNA expression levels were measured. F–H, Liver cancer cell lines (Huh7, Hep3B, HepG2) were treated with three ER stress inducers including thapsigargin (Thap, 1 μmol/L) for 72 hours, tunicamycin (Tuni, 1 μg/mL) for 24 hours, and dithiothreitol (DTT, 1 mmol/L) for 24 hours. SPINK1 expression level was measured by quantitative RT-PCR (F), and the protein levels of DNMT1 (G) and NF-κB signaling pathway (H) were measured by immunoblot. I, NF-κB signaling in the indicated cells with stable transfection of vector or DNMT1 was measured by immunoblot. Statistical significance is indicated. *, P < 0.05; **, P < 0.01; ***, P < 0.001, Student t test.
SPINK1 expression is regulated by DNA hypomethylation. A, Venn diagram showing the overlap of SPINK1 between the DMPs and METcor genes. B, Stepwise methylation of SPINK1 during multistep hepatocarcinogenesis is shown in YSHCC and GSE44970, respectively. C, Inverse correlation of SPINK1 expression with its DNA methylation levels is shown in pooled HCC including YSHCC, LIHC-TCGA, and GSE87630. D, SPINK1 mRNA expression levels in the indicated cells with stable transfection of vector or DNMT1 were measured by quantitative RT-PCR. E, 5-aza-DC (5-aza-deoxycytidine, 10 μmol/L) was treated on the liver cancer cells (Huh7, Hep3B, and HepG2), and the SPINK1 mRNA expression levels were measured. F–H, Liver cancer cell lines (Huh7, Hep3B, HepG2) were treated with three ER stress inducers including thapsigargin (Thap, 1 μmol/L) for 72 hours, tunicamycin (Tuni, 1 μg/mL) for 24 hours, and dithiothreitol (DTT, 1 mmol/L) for 24 hours. SPINK1 expression level was measured by quantitative RT-PCR (F), and the protein levels of DNMT1 (G) and NF-κB signaling pathway (H) were measured by immunoblot. I, NF-κB signaling in the indicated cells with stable transfection of vector or DNMT1 was measured by immunoblot. Statistical significance is indicated. *, P < 0.05; **, P < 0.01; ***, P < 0.001, Student t test.
As shown in Fig. 5A, ER stress in HGDN may promote oncogene activation in eHCC. To test this hypothesis, we evaluated whether ER stress affected DNA methylation–mediated SPINK1 expression. Upon treating liver cancer cells with ER stress inducers, such as thapsigargin, tunicamycin, or dithiothreitol (DTT), SPINK1 expression was upregulated while DNMT1 levels were significantly downregulated (Fig. 6F and G). Induction of ER stress in the cells was confirmed by examining the expression levels of GRP78 (glucose-regulated protein, 78 kDa), an ER-stress marker (Supplementary Fig. S13). These results suggested that SPINK1 expression was regulated by epigenetic DNA methylation, which was promoted by induced ER stress.
In addition, NF-κB is reported to be a regulator of SPINK1 expression (34), while ER stress acts as an activator of NF-κB signaling (35, 36). Therefore, we examined the role of NF-κB in ER stress–induced SPINK1 expression. However, we observed that neither ER stress induction nor DNMT1 overexpression affected NF-κB signaling (Fig. 6H and I). Moreover, short-hairpin RNA-mediated knockdown of NF-κB p65 levels showed no effect on SPINK1 expression, indicating that NF-κB signaling was not involved in the ER stress–induced SPINK1 expression (Supplementary Fig. S14A–S14C).
Discussion
In this study, we performed multiomics profiling and integrative analyses to comprehensively demonstrate the sequential dynamics underlying multilayered aberrations in DNA methylation, DNA copy numbers, mutations, and transcriptional activities during hepatocarcinogenesis. Our findings are summarized in Fig. 7.
Graphical summary of the dynamics of genomic, epigenomic, and transcriptomic aberrations during stepwise hepatocarcinogenesis.
Graphical summary of the dynamics of genomic, epigenomic, and transcriptomic aberrations during stepwise hepatocarcinogenesis.
Our results demonstrated that aberrant DNA methylation occurred during eHCC and pHCC development. In particular, we observed initial DNA methylation in eHCC (e.g., DKK3, SALL3, and SOX1) and further profound methylation in pHCC (e.g., CCNA1, HOXA9, WT1, PENK, TRIM58, and TBX4). Remarkably, we observed that DNA methylation process was initiated from CGI-flanking regions to CGIs during hepatocarcinogenesis.
DNA copy number analysis revealed loss in DNA copy numbers initiated during eHCC, particularly in the TSGs (e.g., RB1) of 4q and 13q regions. This finding contrasts with previous results showing that small HCCs exhibit copy number gains at 1q and 8q (6.9%; ref. 9). Indeed, we also observed marginal levels of copy number gains at 1q and 8q (3/9 of eHCC samples; Fig. 2C); however, small HCC is defined as a tumor <2.0 cm in diameter, whereas eHCC is a histologically well-differentiated subgroup of small HCC that shows indistinct margins and represents the earliest tumor lesions (37). Hence, it is plausible that the subtle alterations at 4q and 13q in eHCC may not be discernible in the extended group of small HCCs. We are currently investigating this issue in order to validate our findings via large-scale studies of whole-genome sequencing data.
In-depth transcriptome analysis revealed stepwise reprogramming that induced increased immune activity in precancerous lesions. Increasingly, evidence indicates that the immune system affects tumor initiation as well as progression, and that complex interactions that occur between immune cells and cancer cells are capable of both inhibiting or enhancing tumor growth (38). The active-immune reactions taking place in precancerous tissues may participate in cancer-specific immune editing to eliminate tumor cells. During malignant conversion, eHCC was reprogrammed to express exhausted-immune and late-TGFβ signaling, which may be considered as a hallmark of eHCC development. Because TGFβ activity represents a primary mechanism that enables immune evasion in tumor microenvironment (39), we suggest that the transition of expression from an early TGFβ signature to a late one may promote immune evasion by suppressing microenvironmental immune surveillance in eHCC.
In addition to immune-related activities, precancerous HGDN expressed ER stress–related genes. In fact, a hepatitis viral infection in the liver tissues activates the immune system and alters mitochondrial function, which enhances oxidative DNA damage, promoting ER stress in cells (40). The resulting inflamed and damaged cells activate cell proliferation‐-related signals and promote tumor formation (41). Indeed, sustained ER stress in cancers fosters an immunosuppressive and protumorigenic microenvironment (42). Moreover, our unsupervised analysis suggested that the induction of ER stress in precancerous tissues activates oncogenes, thereby promotes eHCC progression. SPINK1 was found as a potential driver gene linking the functional domains of ER stress to oncogene activation during eHCC development (Fig. 5A). As the limited data availability of LGDN and HGDN samples, we did not further analyze the detailed difference between them. Moreover, LGDN and HGDN showed similar expression pattern to each other compared with those of malignant lesions of eHCC or pHCC (see Fig. 4, and Supplementary Figs. S7 and S8).
SPINK1 exhibits tumor-promoting functions, revealing a clinical association in diverse cancer types, including breast, prostate, and pancreatic cancer (43, 44). In liver cancer, SPINK1 expression enhances aggressive phenotypes, such as tumor growth, portal vein invasion, and early recurrence (45). A recent study showed that stromal cells that produced SPINK1 following genotoxic treatment subsequently reprogrammed cancer cells and promoted EMT (38). However, the mechanisms associated with SPINK1 expression and its involvement in cancer initiation and clinical implications remain unclear. We found that SPINK1 expression was regulated by DNMT1-mediated DNA methylation. Therefore, we suggest that the elevated ER stress present in precancerous lesions may induce SPINK1 expression, which in turn promotes HCC development and progression. Furthermore, we demonstrated the clinical utility of SPINK1 as a potential immunostaining marker for differential diagnosis of eHCC from precancerous lesions. These results imply that the modulation of ER stress, SPINK1 expression, or SPINK1 methylation can be new therapeutic targets for HCC treatment, although further extended studies might be required.
In conclusion, our comprehensive analysis of genomic and epigenomic regulation during the stepwise development of HCC provides novel mechanistic and clinical insights into the development of novel diagnostics and therapeutics for HCC.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: H.G. Woo, Y.N. Park
Development of methodology: H.G. Woo, Y.N. Park
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H. Rhee, J.H. Nahm, J.E. Yoo, G.H. Choi, Y.N. Park
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B.A. Jee, J.-H. Choi, S.M. Kwon, G.H. Choi, H.G. Woo, Y.N. Park
Writing, review, and/or revision of the manuscript: J.-H. Choi, H. Rhee, S. Yoon, G.H. Choi, H.G. Woo, Y.N. Park
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H. Rhee, J.E. Yoo, Y. Jeon, Y.N. Park
Study supervision: H.G. Woo, Y.N. Park
Acknowledgments
This research was supported by grants from the National Research Foundation of Korea (NRF) funded by the Korean government (MSIP; NRF-2017R1E1A1A01074733, NRF-2017M3A9B6061509, NRF-2017M3C9A6047620, NRF-2019R1A5A2026045, NRF-2017R1A2B4005871, NRF-2016M3A9D5A01952416, and NRF-2017M3A9B6061512).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.