Abstract
Intratumoral hepatitis B virus (HBV) integrations and mutations are related to hepatocellular carcinoma (HCC) progression. Circulating cell-free DNA (cfDNA) has shown itself as a powerful noninvasive biomarker for cancer. However, the HBV integration and mutation landscape on cfDNA remains unclear.
A cSMART (Circulating Single-Molecule Amplification and Resequencing Technology)-based method (SIM) was developed to simultaneously investigate HBV integration and mutation landscapes on cfDNA with HBV-specific primers covering the whole HBV genome. Patients with HCC (n = 481) and liver cirrhosis (LC; n = 517) were recruited in the study.
A total of 6,861 integration breakpoints including TERT and KMT2B were discovered in HCC cfDNA, more than in LC. The concentration of circulating tumor DNA (ctDNA) was positively correlated with the detection rate of these integration hotspots and total HBV integration events in cfDNA. To track the origin of HBV integrations in cfDNA, whole-genome sequencing (WGS) was performed on their paired tumor tissues. The paired comparison of WGS data from tumor tissues and SIM data from cfDNA confirmed most recurrent integration events in cfDNA originated from tumor tissue. The mutational landscape across the whole HBV genome was first generated for both HBV genotype C and B. A region from nt1100 to nt1500 containing multiple HCC risk mutation sites (OR > 1) was identified as a potential HCC-related mutational hot zone.
Our study provides an in-depth delineation of HBV integration/mutation landscapes at cfDNA level and did a comparative analysis with their paired tissues. These findings shed light on the possibilities of noninvasive detection of virus insertion/mutation.
Hepatitis B virus (HBV) integration is related to hepatocellular carcinoma (HCC) progression and approximately 90% of HBV-related HCC cases were reported with HBV integrations. HBV mutations, such as nt1762/1764, were reported to be correlated with HCC development and patient survival. However, considering the cost effectiveness and limitation of current technology, the viral genome-wide detection of mutations and integrations in HCC and liver cirrhosis (LC) circulating cell-free DNA (cfDNA) is lack of study. To this end, we developed a Circulating Single-Molecule Amplification and Resequencing Technology (cSMART)-based method (SIM) to simultaneously detect HBV integration and mutation at cfDNA level. The HBV integration and mutation landscapes on cfDNA of 481 patients with HCC and 517 patients with LC were delineated. A total of 6,861 integration events were identified in HCC cfDNA. A region from nt1100 to nt1500 on HBV genome was identified as a HCC-related mutational hot zone. Our study provided evidence for the detection of intratumoral HBV integrations using a noninvasive method and demonstrated detailed mutational patterns on HBV genome and their clinical relevance.
Introduction
Human hepatitis B virus (HBV) infection is one of the most common etiologies for liver cirrhosis (LC) and hepatocellular carcinoma (HCC). Approximately 240 million people are suffering from chronic HBV infection worldwide (1). HBV-induced viral hepatitis-related morbidity and mortality causes up to 1.5 million deaths annually (1). Although prophylactic vaccines for HBV infection are widely available now, there is currently no cure for patients with chronic HBV infection.
HBV integration is closely related to HCC progression and approximately 90% of HBV-related HCC cases were reported with HBV DNA integrations (2). In comparison, HBV integration incidents were less detected in adjacent nontumor tissues (3). The contribution of HBV integration in hepatocarcinogenesis has been demonstrated with multiple hypotheses, including the formation of persistent templates for HBV gene expression, induction of chromosomal instability, and disruption of cancer-associated genes (4). HBV integration occurs randomly in host cell genomes (5), however, the discovery of recurrent integration sites shed light on specific driver genes causing clonal expansion of hepatocytes and provided possible explanations for hepatocarcinogenesis. The HBV integration in HCC tissues has been widely studied. The recurrent integration incidents frequently occurred in genes encoding human telomerase reverse transcriptase (hTERT), mixed-lineage leukemia 4 (MLL4), and cyclin e1 (CCNE1; ref. 3). Notably, there was one gene, fibronectin 1 (FN1) that was recurrently affected by HBV integration in adjacent nontumor tissues (3).
HBV belongs to the hepadnaviridae family, which consists of enveloped viruses with an incomplete double-stranded DNA genome of 3.2 kb. HBV genome contains four partially overlapping open reading frames (ORF): C, encoding the nucleocapsid (core) protein (HBcAg) and secreted e antigen (HBeAg); P, the polymerase protein (Pol); S, the envelope proteins; and X, a transcriptional trans-activator protein. Although HBV mutations may occur throughout the whole genome, most previously published studies focused on basal core promoter (BCP) region (nt1751–1769; ref. 6), which is mainly regulated by the enhancer II (nt1636–1744; ref. 6) and controls the transcription of precore mRNA and pregenomic RNA (7). A1762T/G1764A is the most commonly studied mutation hotspot in BCP region (8, 9) and has been shown to be a crucial biomarker for identifying a subset of HBV-infected patients at extremely high risk of developing HCC (10–12). However, the mutation landscape on whole HBV genome scale is lack of study.
Circulating cell-free DNA (cfDNA) has shown itself as a powerful noninvasive blood-derived biomarker for early cancer detection, monitoring cancer development, and predicting cancer prognosis (13). In healthy people, plasma cfDNA mainly derives from hematopoietic cells. However, in patients with cancer, it can also be released from tumor cells during apoptotic or necrotic processes (14, 15), which is called circulating tumor DNA (ctDNA). ctDNA accounts for a part of cfDNA, ranging from <0.1% to >90% (16), and the quantity of ctDNA could reflect tumor development in patients with cancer (17). Recent studies found that the genetic and epigenetic information carried by cfDNA held great diagnostic and prognostic values in HCC (18, 19). However, the HBV integration/mutation landscape in HCC cfDNA remains unknown. Normally, whole-genome sequencing (WGS) would be the ideal way of achieving this goal, but the high cost limits its application. Hence, there is a need to develop a novel method that can simultaneously detect the HBV integration and mutation landscapes in cfDNA and meanwhile be cost effective.
In this study, we developed a Circulating Single-Molecule Amplification and Resequencing Technology (cSMART)-based method, a highly sensitive method for assaying viral DNA integration with host genomes and detecting HBV mutation in cfDNA at the same time. Using this method, we analyzed cfDNA samples from 481 patients with HCC and 517 patients with LC. Our results discovered a unique HBV integration pattern of HCC cfDNA which was closely associated with ctDNA fraction in cfDNA and identified distinctive HBV integration landscapes in both patients with HCC and LC. Furthermore, an HCC-related zone (nt1100∼nt1500) containing multiple mutation hotspots was found in HBV genotype C genome.
Materials and Methods
Patients
A total of 998 patients were enrolled between June 2018 and June 2019. Patients were diagnosed with HCC (n = 481) or LC (n = 517). The study protocol was reviewed and approved by the institutional review board at all participating hospitals. This study was performed in accordance with the principles of the Declaration of Helsinki. All participants provided written informed consent. Patients with HCC were enrolled from Eastern Hepatobiliary Surgery Hospital and Shanghai Zhongshan Hospital (Supplementary Table S1). The LC patient population was enrolled from 13 hospitals including (i) Eastern Hepatobiliary Surgery Hospital (Shanghai), (ii) the first affiliated hospital of Southern Medical University (Guangdong Province), (iii) Mengchao Hepatobiliary Hospital (Fujian Province), (iv) Subei People's Hospital (Jiangsu Province), (v) Ningbo No. 2 Hospital (Zhejing Province), (vi) the second affiliated hospital of Shandong University (Shandong Province), (vii) the first affiliated hospital of Xinjiang Medical University (Xinjiang Province), (viii) the first affiliated hospital of Jilin University (Jilin Province), (ix) Chifeng Municipal Hospital (Neimenggu Province), (x) Xuzhou No. 1 People's Hospital (Jiangsu Province), (xi) Southwest Hospital (Chongqing), (xii) The Central Hospital of Wuhan (Hubei Province), and (xiii) Xuzhou Infectious Disease Hospital (Jiangsu Province; Supplementary Table S1).
Study design
A novel cSMART assay was designed for simultaneous detection of HBV integration and HBV mutation in plasma. Plasma was separated by two rounds of centrifugation from approximately 10 mL of blood and cfDNA was extracted using the MagMAX Cell-Free DNA isolation Kit (Thermo Fisher Scientific). In a cSMART assay, each cfDNA molecule is uniquely barcoded with specially designed short barcode adapters and universally amplified to make duplicates of the original molecule. The amplification products are circularized and re-amplified with target-specific back-to-back primer pair pools to enrich the targets. This unique primer design makes the target sequences amplified inversely and therefore preserve the molecular barcodes and the length information of the original cfDNA molecule. This inverse amplification step also enables the assay to detect novel HBV integration sites while maintaining a high detection sensitivity. The inversely amplified DNA products are then ligated with TruSeq adapters (Illumina) and sequenced 150 base pairs from each end in very high-depth (>20,000×) with Illumina NextSeq 500 (Illumina). The original cfDNA molecules are reconstituted by a multistep bioinformatics pipeline, during which the sequencing errors are corrected according to the consensuses from the barcoded duplicates.
HBV-specific primer design
To fully cover the whole HBV virus genome and improve the detection sensitivity, HBV-specific primers were designed with high intensity and distributed in 54 regions. The primer sequences were listed in Supplementary Table S2. During library construction, HBV-specific primers were divided into four primer pools. Primers within the same pool were kept at a distance of above 200 bp to decrease the nonspecific amplification among primers. Considering different HBV genotypes share highly homogenous sequences, we designed our HBV-specific primers based on the sequences of HBV genotype A, B, C, and D. In homogenous regions, the same primers were applied for all HBV genotypes whereas in heterogeneous regions, different primers were designed.
HBV breakpoint detection and mutation analysis
First, reads were reconstituted and assembled into contiguous consensus sequences after removing those in low quality and duplicates. Subsequently, the contiguous consensus sequences were mapped to the reference human genome (hg19) and HBV genome to obtain locus information. Burrows-Wheeler Aligner (BWA, v0.7.120) was used to align contiguous consensus sequence onto human (NCBI build 37, HG19) and HBV genome (LC519798.1, LC500247.1, LC365290.1, LC519793.1, LC513651.1, LT935665.1, AB064310.1, and AB846650.1). The contiguous consensus sequences that can only be mapped to human or HBV reference genome were removed. The remaining sequences were used to identify fusion breakpoints using custom software. The position of a breakpoint was defined as the junction of human and HBV sequences in a contiguous consensus sequence. ChIPseeker (v1.24.0) was used to do the annotation for the integrated breakpoints.
The bam file produced by BWA was used to call mutations of HBV using strelka (v 2.8.4). To remove false-positive calling, mutation sites with sequence depth less than 10× were filtered out. Mutation sites with patient frequencies above 10% and detected in more than 10 patients were regarded as hotspots. In a particular case, a mutation site was spotted on every read within one patient and this case appeared in over 90% of all patients reported carrying this mutation; then this mutation site would be ruled out to avoid single-nucleotide polymorphism.
Ultra-low-pass WGS and ctDNA determination
In this study, ichorCNA (https://github.com/broadinstitute/ichorCNA/) was used to analyze the tumor fraction (ctDNA) of cfDNA samples. ichorCNA is a tool based on Hidden Markov Model (HMM) to predict the ctDNA content of ultra-low-throughput WGS (ULP-WGS; 0.1x coverage). First, low-depth WGS sequencing was performed on all samples and bam files were obtained by comparing sequencing reads with the reference genome. The genome was divided into nonoverlapping windows (1M-bin) and after filtering DNA repetitive regions/low-coverage regions, and GC correction, the number of reads in each window was extracted as input for ichorCNA. The median of reads in each window of the autosomes of healthy individuals was used as the baseline standard. Multiple relationships in the number of reads in each window between liver cancer samples and healthy individuals were calculated and standardized by log2 to predict the amount of ctDNA in each sample.
WGS on HCC tissue
Sequencing libraries were prepared using 500 ng DNA. DNA samples were then subjected to end-repair/dA-tailing (5X ER/A-Tailing Enzyme Mix) and adaptor ligation (WGS Ligase). The adaptor sequence was specifically designed for Illumina NovaSeq 6000 platform. After purified by Agencourt AMPure XP beads (Beckman Coulter), Libraries were quantified by the KAPA Library Quantification Kit (Kapa Biosystems) and size was confirmed using Bioanalyzer (Agilent). Sequencing libraries were pooled at equal amounts. WGS at an average coverage of 120× was performed on Illumina NovaSeq 6000 platform (Illumina) using 2 × 150 bp paired-end sequencing.
Reads were aligned on the UCSC human reference genome build 37 (HG19) and HBV Genome using BWA (v0.7.12). Then, the bam file was used to identify integration breakpoint. The criteria of a breakpoint are: (i) at least four reads and two split reads support usual breakpoint, (ii) at least 10 reads and 5 split reads support unusual breakpoint, and (iii) frequency ≥5%.
Statistical analysis and data availability
Fisher exact test was used to determine the differences in categorical variables. Continuous variables were adjusted to normal distribution by transformation into logarithmic function, and then tested by Student t test or analysis of variance. A P value of <0.05 was considered statistically significant. The ROC curve was plotted using Python version: 2.7.16. The access to sequencing data could be acquired by contacting corresponding authors.
Results
Simultaneous detection of HBV integration and mutation in cfDNA
To obtain the HBV integration and mutation landscapes in cfDNA, we developed a cSMART-based method that could simultaneously detect the HBV integration and mutation in cfDNA (SIM method). cSMART was originally developed to improve the detection sensitivity of circulating DNA, which exists in minimal dose by sequencing the inversely amplified target regions on circularized cfDNA molecules. cSMART is particularly useful in detecting single-nucleotide variants (SNV) and fusion events in cfDNA, and has been proven sensitive in multiple prenatal testing and cancer liquid biopsy studies (20–23). HBV genome is very small, with a length of 3215nt. To cover the whole virus genome, we designed a panel of primers with high density. Considering that HBV subtypes share highly homogenous sequences, only in some heterogeneous regions, genotype-specific primers were designed (Supplementary Fig. S1A). Finally, 54 groups of primers were designed, which were assigned into four primer pools to decrease the nonspecific amplification among primers. Collectively, a novel cSMART-based assay was designed for simultaneous detection and quantification of HBV integration incident and HBV mutation in plasma (Fig. 1). The bioinformatics workflow is shown in Supplementary Fig. S1B.
HBV integration sites in HCC cfDNA
To explore the HBV integration landscape of HCC cfDNA in human genome, we conducted SIM analysis with cfDNA samples from a cohort of 481 pathologically diagnosed HCC cases: 421 males and 60 females; 383 with HBsAg positive; and 308 with cirrhosis. Three hundred and twenty-one cases were detected having HBV integration incidents (Table 1; Supplementary Table S1). A total of 6,861 HBV integration breakpoints were identified in human genome. The fraction of ctDNA in cfDNA was determined as reported previously (24). To obtain the tumor fraction in each HCC cfDNA sample, low-pass WGS data of all HCC samples was acquired from our previous study (25). The ctDNA fraction of patients with HCC detected with HBV integration was significantly higher than those without (Fig. 2A). On the basis of the cutoff value of ctDNA fraction at 10%, the HCC cohort was divided into ctDNAhigh group (107 cases, 75.7% cases with HBV integration) and ctDNAlow group (374 cases, 64.2% cases with HBV integration). The total HBV integration incidents in integration-detected ctDNAhigh and ctDNAlow samples are 3,704 and 3,157 (median 29 and 6), respectively. ctDNAhigh samples harbor a much higher frequency of HBV integration incident than ctDNAlow samples (Fig. 2B), which is in accordance with the known association between ctDNA concentration/HBV integration and HCC progression (3, 26). Also, we discovered that the number of HBV integration incidents in cfDNA was positively associated with ctDNA fraction (Fig. 2C), indicating cfDNA HBV integration might be linked with tumor biological characteristics as ctDNA.
Variable . | Patients (%) . |
---|---|
Age | 57.0 (21–84) |
Gender | |
Male | 421 (87.5) |
Female | 60 (12.5) |
HBsAg | |
Positive | 380 (79.5) |
Negative | 98 (20.5) |
AFP (>20 μg/L) | 257 (53.8) |
PIVKA-II (>40 mAU/mL) | 389 (84.2) |
CA19–9 (>39 U/mL) | 61 (12.8) |
LC | 308 (64.4) |
Tumor size (cm) | 6.4 (1–30) |
Tumor number | |
Single | 390 (81.1) |
Multiple | 91 (18.9) |
BCLC | |
0 | 19 (4.1) |
A | 49 (10.7) |
B | 133 (29.1) |
C | 256 (56.0) |
MVI | |
M0 | 201 (44.0) |
M1 | 139 (30.4) |
M2 | 117 (25.6) |
HBV integration | |
Yes | 321 (66.7) |
No | 160 (33.3) |
ctDNA | |
>0.1 | 107 (22.2) |
<0.1 | 374 (77.8) |
Variable . | Patients (%) . |
---|---|
Age | 57.0 (21–84) |
Gender | |
Male | 421 (87.5) |
Female | 60 (12.5) |
HBsAg | |
Positive | 380 (79.5) |
Negative | 98 (20.5) |
AFP (>20 μg/L) | 257 (53.8) |
PIVKA-II (>40 mAU/mL) | 389 (84.2) |
CA19–9 (>39 U/mL) | 61 (12.8) |
LC | 308 (64.4) |
Tumor size (cm) | 6.4 (1–30) |
Tumor number | |
Single | 390 (81.1) |
Multiple | 91 (18.9) |
BCLC | |
0 | 19 (4.1) |
A | 49 (10.7) |
B | 133 (29.1) |
C | 256 (56.0) |
MVI | |
M0 | 201 (44.0) |
M1 | 139 (30.4) |
M2 | 117 (25.6) |
HBV integration | |
Yes | 321 (66.7) |
No | 160 (33.3) |
ctDNA | |
>0.1 | 107 (22.2) |
<0.1 | 374 (77.8) |
Abbreviations: AFP, alpha fetoprotein; CA19–9, carbohydrate antigen 19–9; HBsAg: hepatitis B surface antigen; MVI, microvascular invasion; PIVKA-II, protein induced by vitamin K absence or antagonist-II.
Generally, most insertion sites are distributed in a random manner throughout the human genome (Fig. 2D). The integration map uncovered a distinctive but widespread HBV integration landscape in ctDNAhigh HCC as well as in ctDNAlow HCC. HBV integration hotspots were defined as genes that were recurrently (n > 5) affected by HBV integration in our study (Supplementary Table S3). Interestingly, we not only identified integration hotspots that were previously reported in HCC tissues (3, 27) in HCC cfDNA, including TERT and KMT2B, but also those reported only in adjacent nontumor tissue (27), such as FN1. Integration hotspots, TERT and FN1 were spotted in both ctDNAhigh and ctDNAlow groups with different frequency: TERT (27.2% in ctDNAhigh, 14.6% in ctDNAlow); FN1 (7.4% in ctDNAhigh, 13.3% in ctDNAlow), whereas KMT2B was only found in one case of ctDNAhigh group. Besides, FRMD4B was seen in five ctDNAlow cases even though it has never been reported as an integration hotspot in HCC tissues. The genomic locations of the tumor-specific HBV integration sites in the three recurrently affected genes are shown (Supplementary Fig. S2A). In both ctDNAhigh/ctDNAlow groups, most breakpoints in TERT are located within or near promoter region whereas for FN1, most breakpoints are located in introns (Fig. 2E and F). These findings were in concordance with the preferential breakpoint location reported in HCC/adjacent normal tissues (3).
Next, we investigated the correlation between HBV integration landscape and clinical characteristics. In HCC, we discovered that tumor-related parameters including AFP, tumor size, BCLC stage, and microvascular invasion (MVI) were significantly higher or more advanced in ctDNAhigh groups (Fig. 2G).
The comparison of HBV integration patterns between HCC and LC cfDNA
LC is a predisposing factor in HCC. The development of cirrhosis facilitates a series of genetic or epigenetic changes, resulting in the formation of dysplastic nodules, a premalignant stage in HCC (28). Although HBV infection accounts for a main cause of LC, the HBV integration landscape in LC cfDNA remains unclear. To this end, we performed SIM analysis on 517 LC samples and found 315 cases with HBV integration incidents. Interesting, younger and male patients with LC were prone to be detected with HBV integration in cfDNA (Supplementary Table S4). The ratio of HBV integration in LC is less frequent than that in HCC. The total number of genes with HBV integration was 2,141 and 864 in HCC and LC, respectively. Furthermore, the average integration incident number in LC is significantly smaller than that in HCC (Fig. 3A), which is in accordance with the current understanding that HBV is a major etiologic agent in HCC development. The detection rate of TERT integration was also markedly higher in HCC group (Fig. 3B). A total of 2,475 HBV integration incidents were found in 315 LC cases and distributed randomly across the whole genome with only one hotspot, FN1 (Fig. 3C), as previously reported in HCC nontumor tissue (3). The detection frequency of FN1 in LC samples was 16.5% (52/315), significantly higher than HCC samples; however, cfDNA integration sites found in HCC were rarely found in LC samples. As to FN1, we noticed that most breakpoints occurred in introns of LC cfDNA samples (Supplementary Fig. S3A), similar to the findings in HCC non-tumor tissues (3). However, this integration hotspot showed no correlation with LC patients' clinical characteristics (Supplementary Fig. S3B). Also, we observed that HBV integration incident number was significantly higher in HBeAg-positive patients in both HCC and LC (Supplementary Fig. S3C), suggesting that the viral replication capacity was positively associated with HBV integration detection rate in cfDNA. In the previous study (27), Zhao and colleagues reported 88 genes recurrently (n > 1) affected by HBV integration in a cohort of 426 HCC tissues. About 40.9% (36/88) of these genes were also reported in our study (Fig. 3D) and if we changed the criteria to n > 2, the overlapping rate could increase to 60.9% (14/23), suggesting that HBV integration hotspots with higher recurrence would be more easily detected at cfDNA level.
The feature of HBV breakpoints
To better understand the HBV integration landscapes detected in HCC/LC cfDNA, we studied the preferential locations for breakpoints throughout the whole HBV genome (nt1-nt3215). First, we investigated the prevalence of different HBV genotypes in our cohorts. The sequencing result of HBV DNA showed that HBV genotype C was the predominant type in both patients with HCC and LC with HBV integrations, accounting for 68.5% and 63.8%, respectively. HBV genotype B was the second most common, accounting for 21.5% and 19.4%, respectively. Apparently, the distribution of HBV genotypes in LC was close to HCC and the prevalence of HBV genotype detected in cfDNA was similar to that in tissue (27). Then we surveyed the breakpoints on HBV genome. We noticed that the distribution of breakpoints on HBV genome was not random. Consistent with previous findings in which breakpoints in HBV genome mostly happened at nt1400-nt1900 where the viral enhancer, X gene, and core gene are located (3), we found that the location of HBV breakpoints at cfDNA level was similar to tissues but still had its own features. In both patients with HCC and LC, breakpoints were enriched around nt1700-nt1900 in HBV genotype B where HBV X protein, core protein were coded, whereas in HBV genotype C, around nt1800-nt1900 (Supplementary Fig. S4A). Between nt100-nt200 in HBV genotype C and nt1000-nt1100/nt2800-nt2900 in HBV genotype B, ctDNAhigh HCC harbored markedly more breakpoints. However, ctDNAhigh HCC had significantly fewer breakpoints between nt1800-nt1900 in HBV genotype C (Fig. 4A). The circumstance in LC cfDNA was similar to that in ctDNAlow HCC. Besides, we also noticed that the integration level (normalized by per read of HBV genome) in HBV genotype C was significantly higher than HBV genotype B in both HCC and LC (Fig. 4B), opposite to the pattern in HCC tissue (27). We then annotated the HBV integration breakpoints to examine their distribution in distinct HBV genomic elements (Supplementary Fig. S4B). Compared with LC cfDNA, in both ctDNAhigh and ctDNAlow HCC cfDNA, HBV breakpoints were preferentially located in intergenic regions (P < 0.001). Furthermore, in coding regions, the breakpoints of ctDNAhigh HCCs were significantly overrepresented in promoter (defined as 0 to –5 kb relative to the transcriptional start site) regions, with P < 0.0001. Such integration site bias was also similar to the integration bias inside HCC tumor tissues (27).
Paired analysis of HBV integrations in HCC cfDNA and tissue
To study the origin of HBV integration in cfDNA, we acquired 446 paired HCC tissues and performed WGS to evaluate the HBV integration landscape in tissue. First, we discovered that the HBV integration reads were weakly correlated in paired cfDNA and tumor tissue with statistical significance (Fig. 4C). Three hundred and eighty-nine breakpoints were shared between paired HCC tissue samples (Fig. 4D). Most of unoverlapped breakpoints were considered as random, nonrecurrent incidents (num <5; Supplementary Table S3). Among the 389 overlapped breakpoints, 49 were on TERT, accounting for 78% of TERT breakpoints discovered in cfDNA, and eight were KMT2B, 89% of discovered in cfDNA. The sensitivity/specificity of SIM method was 0.56/0.91 in terms of TERT integration (Supplementary Table S5). On the contrary, FN1 breakpoints were not detected in any HCC tissues, confirming its origin of nonmalignant tissues (Fig. 4E). Furthermore, we discovered that the read frequency of integration breakpoints overlapped between cfDNA and paired tissue was significantly higher than those unoverlapped (Supplementary Fig. S4C), suggesting breakpoints that went through vigorous clonal expansion were prone to be detected in cfDNA. Furthermore, when we changed the integration calling criterion in cfDNA to read ≥2, the total number of breakpoints in cfDNA reduced to 1,068, in which 309 breakpoints were also reported in paired tissues. Collectively, HCC-related HBV integration found in cfDNA came from HCC tumor tissue and the SIM method could be applied as a noninvasive method to study the intratumoral HBV integration landscape.
HBV mutation hotspots
Previous studies mostly focused on the mutations located in the BCP region (nt1751–1769) of HBV genome. The delineation of mutation hotspots across the whole HBV genome is lack of investigation. To this end, we applied cSMART-based SIM to reveal the mutation landscape in HBV genome. According to different HBV genome reference sequences, we divided patients with HCC/LC into different HBV genotypes (Supplementary Table S1). To remove the false-positive mutation calls, we set the minimal sequencing depth at 10X. HBV mutation spots with a detection frequency above 10% and detected in more than 10 patients were defined as hotspots. In HBV genotype C of HCC, the mutation rate of A1762T and G1764A was 75.9% and 82.1%, respectively, whereas in HBV genotype B of HCC, the mutation rate of nt1762 and nt1764 was 35% and 31.6%, respectively. Apparently, the mutation rate of HBV hotspots varied between HBV genotype B and C (Supplementary Table S6). Furthermore, the comparative analysis of HBV mutations between cfDNA and tissue data showed high concordance. The sensitivity/specificity of SIM method was 0.65/0.83 and 0.68/0.82 respectively in terms of nt1762 and nt1764 (Supplementary Table S5). In patients with HCC and LC, the distribution of HBV mutation hotspots was rather even across the whole HBV genome with no preference in certain HBV genes (Fig. 5A). In the top 20 most frequent mutations, we observed a huge overlap between HCC and LC. Furthermore, the clonality of most overlapped mutations was significantly higher in HCC, suggesting a clonal expansion of mutation hotspots in HCC (Supplementary Figs. S5A and S5B). Ten HBV genotype C mutation hotspots were found significantly associated with HCC, among which T1165C, T1978C, A841C, and G3210A (OR > 3) were identified as high-risk factors, whereas 15 mutation hotspots were inversely associated with HCC, as compared with patients with LC (Supplementary Table S6). Furthermore, we discovered that most HCC-related HBV genotype C mutations were located between nt1100-nt1500 (HCC related-zone) where HBV X protein and polymerase were coded (Fig. 5B). As to genotype B, five HBV mutation hotspots were found significantly associated with HCC, among which A2721G, T2666C, A636T, and G167T (OR > 3) were identified as high-risk factors, whereas six mutation hotspots were inversely associated with HCC, as compared with patients with LC. However, unlike HBV genotype C, HCC-related HBV genotype B mutations were sporadically distributed across the HBV genome (Supplementary Fig. S5C).
To investigate the clinical relevance of HCC/LC-related mutation hotspots, we evaluated the relationships between clinical parameters and mutational frequencies. The results showed that in HBV genotype C, the integration number increased in nt1978 (HCC-related mutations) mutated patients whereas decreased in nt2771, nt2867, and nt2699 (LC-related mutations) mutated patients. Furthermore, nt1165 and nt1978 (HCC-related mutations) mutated patients harbored higher ctDNA concentration (Fig. 5C, mutation hotspots with no clinical relevance not shown). In HBV genotype B, LC-related mutations, nt2633 and nt2721 mutated patients were diagnosed with earlier BCLC stage and smaller tumor size, respectively (Supplementary Fig. S5D, mutation hotspots with no clinical relevance not shown). We then evaluated the potential value of the presence of HCC/LC-related HBV mutations for the indication of HCC. The diagnostic model was constructed through machine learning. Five hundred and seventy-eight (278 HCC+ 240 LC) and 222 cases (124 HCC and 98 LC) were assigned into training and test set, respectively. The detailed patient assignment and contributions for HCC/LC-related HBV mutations are shown in Supplementary Table S7. As shown in Fig. 5D, the combination of HBV mutations showed superior diagnostic power than AFP alone and improved the diagnostic efficacy from AUC = 0.78 to 0.88.
Similarly, we explored the concordance of HBV mutations between cfDNA data and paired HCC tissue WGS data. The result showed that in most cases, over 70% HBV mutations detected in HCC tissues were also discovered in cfDNA samples.
Discussion
cfDNA has been known to be present in the blood of patients with cancer for decades. The concentration of cfDNA in the blood of patients with cancer is proven much higher than in the blood of healthy controls and nonmalignant patients (29). Numerous studies have reported that cfDNA could inform tumor genetics, tumor epigenetics, and tumor burden (30), which facilitates its use for inexpensive noninvasive testing and presents a viable source of serial sampling for screening and monitoring tumor progression. HBV integration is a crucial risk factor for HCC development and appears in over 90% of all HCC cases (2). However, HBV integration at cfDNA level is lack of study. Previous HBV mutation studies focused on SNV located in BCP region. BCP mutations in HBV-infected patients were proven to be correlated with HBV-cancer risks (31) and contributed to the prognosis in HCC (9). However, the mutations in other regions lack study. The simultaneous detection of HBV integration and mutation in cfDNA would be ideal. To this end, we developed a novel cSMART-based method, SIM, that could delineate HBV integration and mutation landscapes in cfDNA at the same time.
We conducted SIM on 481 HCC cfDNA samples, and 321 cases were found with HBV integration. Because the variable mixture of normal DNA released from normal tissues presents unavoidable technical challenges for the study of the properties of ctDNA, we calculated the ctDNA fraction in all HCC cases and separated them based on a cutoff value of 10%. The HBV integration incident number was significantly higher in ctDNAhigh HCC cases, consistent with the concept that HBV integrations were mainly tumor derived. The average number of HBV integration in ctDNAhigh HCC cases was 45.7, even higher than that reported in HCC tissue (27), which could be attributed to the high sensitivity of cSMART. The HCC tissue-featured HBV integration hotspot TERT was also seen in ctDNAhigh HCC cases with a similar detection rate (27.2%). Although in ctDNAlow HCC cases, the detection rate of TERT was only 16.9%. However, FN1, an HBV integration hotspot only found in HCC nonmalignant tissues was seen in ctDNAlow HCC cases with a high detection rate (15.5%) and in ctDNAhigh HCC cases with a low detection rate (7.4%), consistent with its nonmalignant tissue origin. We also performed HBV integration analysis on 517 LC blood samples and found 315 with HBV integration. As expected, the integration incident number was markedly lower in LC samples compared with HCC, and HCC tissue/cfDNA integration hotspot TERT was not seen in any LC samples. FN1 was the only HBV integration hotspot in LC cfDNA with a similar detection rate with ctDNAlow HCC cases (16.5%). Notably, the breakpoints on TERT and FN1 were mainly located in the promoter zone and intron regions, respectively, consistent with the findings in HCC and nonmalignant tissues (3).
To track the origin of cfDNA-level HBV integration, we interrogated the HBV integration pattern in 446 paired HCC tissues via WGS. We discovered that most breakpoints (93.3%) found in HCC cfDNA were not overlapped in their paired tissues, which might be due to (i) para-tumoral origin or (ii) tumor heterogeneity. However, when we changed the integration calling criterion to read ≥2, the overlapping rate increased to 28.9%. Noticeably, the integration read frequency of unoverlapped breakpoints was significantly lower than those codetected in both cfDNA and tumor tissue, suggesting they were less clonal. In comparison, the overlapped breakpoints were more clonal. As to HBV integration hotspot, TERT and KMT2B, most of their breakpoints were also detected in paired tumor tissues. We inferred that the reason for undetected breakpoints of TERT and KMT2B in paired tissues could be due to (i) different initial filtering criteria of cfDNA and tumor tissue sequencing data or (ii) the existence of tumor heterogeneity. Collectively, HBV integration detected in cfDNA originated from their paired tissues and tumor clonal integration incidents were more likely to be reflected in cfDNA.
We used SIM as a novel technology to first investigate the mutational landscape across the whole HBV genome. To confirm the credibility of our method, we detected the mutation rate of nt1762 and nt1764 in our data and found a similar mutation rate with those reported in other studies (8). Furthermore, we discovered that HBV genotype B and C had different mutational landscapes across the whole viral genome. However, the mutational landscape was rather even between HCC and LC. Most mutation hotspots were identified in both HCC and LC with similar frequency. However, these mutations showed stronger clonality in HCC, suggesting their mutational expansion in the HCC stage. Also, we found 10 HCC-related HBV genotype C mutations and 5 HBV genotype B mutations and interestingly, HBV genotype C HCC-related mutations were preferentially located in a region between nt1100 and nt1500. The combination of HCC/LC-related HBV mutations and AFP achieved higher diagnostic accuracy compared with AFP alone, indicating the possible application in HCC screening for the SIM method. However, a prospective validation cohort would add more robustness to the current diagnostic model.
In summary, by simultaneously detecting HBV integration and mutation in cfDNA via a cSMART-based method, SIM, our study revealed that cfDNA HBV integrations of patients with HCC originated from tumor and paratumor tissues. HBV integration hotspot (such as TERT) detected in cfDNA could inform its highly clonal nature inside the tumor, however this was under the impact of ctDNA concentration. Furthermore, we confirmed the credibility of cSMART in HBV mutation detection and identified an HCC-related mutation hot zone on HBV genome.
Authors’ Disclosures
No disclosures were reported.
Authors' Contributions
B. Zheng: Writing–original draft. X.-L. Liu: Investigation. R. Fan: Investigation. J. Bai: Writing–review and editing. H. Wen: Investigation. L.-T. Du: Investigation. G.-Q. Jiang: Investigation. C.-Y. Wang: Investigation. X.-T. Fan: Investigation. Y.-N. Ye: Investigation. Y.-S. Qian: Investigation. Y.-C. Wang: Investigation. G.-J. Liu: Investigation. G.-H. Deng: Investigation. F. Shen: Investigation. H.-P. Hu: Investigation. H. Wang: Investigation. Q.-Z. Zhang: Investigation. L.-L. Ru: Investigation. J. Zhang: Investigation. Y.-H. Gao: Investigation. J. Xia: Investigation. H.-D. Yan: Investigation. M.-F. Liang: Investigation. Y.-L. Yu: Investigation. F.-M. Sun: Investigation. Y.-J. Gao: Investigation. J. Sun: Investigation. C.-X. Zhong: Investigation. Y. Wang: Investigation. F. Kong: Investigation. J.-M. Chen: Investigation. D. Zheng: Investigation. Y. Yang: Investigation. C.-X. Wang: Investigation. L. Wu: Supervision. J.-L. Hou: Supervision. J.-F. Liu: Investigation. H.-Y. Wang: Supervision. L. Chen: Supervision.
Acknowledgments
This work was supported by the National Research Program of China (2017YFA0505803, 2017YFC0908100), the State Key Project for Liver Cancer (2018ZX10732202–001, 2018ZX10302207–004), National Natural Science Foundation of China (81790633, 81988101, 91859205, and 81830054), and National Natural Science Foundation of Shanghai (17ZR143800, 201901070007E00065). We thank the support of Shanghai Key Laboratory of Hepato-biliary Tumor Biology and Military Key Laboratory on Signal Transduction. This study is also supported by Innovation Program of Shanghai Municipal Education Commission.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.