Earliest hepatitis B virus-hepatocyte genome integration: sites, mechanism, and significance in carcinogenesis

Hepatocellular carcinoma (HCC) is the fifth most widespread cancer responsible for one fourth of cancer-related deaths globally. Persistent infection with hepatitis B virus (HBV) remains the main cause of HCC summing up to 50% of its causative etiology. Our recent studies, supported by findings from others, uncovered that HBV and its close relative woodchuck hepatitis virus (WHV) integrate into hepatocyte genome almost immediately, hence in minutes after infection. Retrotransposons and genes with translocation potential were found to be frequent sites of HBV insertions, suggesting a mechanism of HBV DNA spread across liver genome from the earliest stages after virus invasion. Many other genes were identified as the sites of early hepadnavirus merges in human hepatocyte-like lines infected de novo with HBV and in natural woodchuck WHV infection model. It was uncovered that head-to-tail joins (HTJs) prevail among the earliest virus-host fusions, implying their formation via the non-homologous-end-joining (NHEJ) pathway. Overlapping homologous junctions resulting from the micro-homology-mediated-overlapping-joining (MHMOJ) were rarely detected. Formation of the initial HTJs coincided with strong induction of reactive oxygen species (ROS) and transient appearance of inducible nitric oxide (iNOS). This was accompanied by cell DNA damage and activation of the poly(ADP-ribose) polymerase 1 (PARP1)-mediated host DNA repair machinery, which may explain predominant HTJ format of the first virushost fusions. Identification of initial integration sites and resulting alterations in hepatocyte phenotype may pave a way to discovery of reliable markers of HBV-triggered HCC, including HCC resulting from occult HBV infection. Our research strongly argues that HBV is an ultimate human carcinogen capable of initiation of a pro-oncogenic process immediately after first contact with a susceptible host. Received: First Decision: Revised: Accepted: Published: Science Editor: Copy Editor: Production Editor: Jing Yu Page 2 of 20 Chauhan et al. Hepatoma Res 2021;7:20 I http://dx.doi.org/10.20517/2394-5079.2020.136


INTRODUCTION
Hepatitis B virus (HBV) is a pro-oncogenic DNA virus that has been identified as a leading risk factor for primary hepatocellular carcinoma (HCC) [1][2][3] . Once chronic, i.e., serum HBV surface antigen (HBsAg)positive infection coinciding with chronic hepatitis type B (CHB) is established, the risk of developing liver cancer increases by many folds [4,5] . In recent years, the incidence rates of HCC have increased and new factors leading to HCC development were uncovered, including protracted non-inflammatory liver disease (NFLD). Nonetheless, CHB and, in more general terms, persistent HBV infection, remains the main cause of HCC despite availability of highly effective vaccines protecting against this virus [6][7][8] . On the other hand, there is no therapy capable of ultimate elimination of HBV from either symptomatically or silently infected patients [9,10] . The direct oncogenic properties of HBV are most explicitly evident in HCC developing in the absence of CHB and cirrhosis, as in cases of clinically silent HBV persistence [11,12] . This unapparent form of infection, named as occult HBV infection (OBI), is a consequence of enduring residual virus replication that is accompanied by traces of circulating HBV DNA in the absence of serum HBsAg detectable by currently available clinical tests [13] . This also is clearly apparent in woodchucks experimentally infected with woodchuck hepatitis virus (WHV), which represent overall an excellent model of molecular and immunological events and pathological outcomes encountered in HBV-infected humans [14,15] . This is well exemplified in primary occult infection (POI) induced by an intravenous (i.v.) injection with WHV at doses lower than 1000 virions, which can trigger HCC in the setting of seemingly entirely normal liver function and morphology, while a low level WHV replication and integrated viral DNA are detectable in the immune system and the liver [16] . With a similar frequency of about 20%, HCC develops in woodchucks recovered from an acute episode of hepatitis in which traces of infectious WHV persist for life [15] . This infection form is termed as secondary occult infection (SOI) and remains serum WHV DNA reactive at levels below 100-200 copies or virus genome equivalents (vge) per mL. It also is serum WHV surface antigen (WHsAg)-negative when evaluated by immunoassays with sensitivity compatible to that of clinical tests currently applied for HBsAg detection, although singular 22-nm WHsAg (envelope) particles and WHsAg short tubular forms can be detected in some animals by electron microscopy after ultracentrifugation [17] . In addition, antibodies to WHV core antigen (anti-WHc) and WHsAg (anti-WHs) are detectable in SOI, while POI is devoid of the WHV-specific humoral response [16,18] . In this context, the incidence of HCC in the course of serum WHsAg-positive chronic hepatitis is much greater and reaches 80%-90%. This suggests that WHV persisting in the liver at a high replication rate and in the context of prolonged hepatic inflammation significantly augments progression to HCC [19,20] .
Integration of HBV DNA into human genome from the beginning of studies on molecular biology of hepadnaviruses was thought to be intimately associated with virus life cycle [21][22][23][24] . In reality, all stages of HBV infection and all in vitro and natural models of the infection examined so far demonstrated evidence of virus-host genomic fusions [25,26] . Hence, the integration of HBV X (HBx) and S gene sequences have been frequently identified in HBV-related HCC and associated non-tumorous hepatic tissue, and the prevailing opinion assumed that HBV-host DNA junctions are spontaneous and that they randomly occur throughout the liver genome [27][28][29] . Nonetheless, recent high-throughput studies indicated that certain genes might be more frequent targets for HBV integration than others, at least in hepatic tissue of patients with advanced CHB and HCC [30][31][32][33][34] . The oncogenic potency of HBV integrations appears to be mainly due to induction of genomic instability in hepatocyte and altered expression of individual genes with tumor suppressive or prooncogenic amplifying functions [35][36][37][38][39][40][41][42][43] .
Although it became generally acknowledged that integration of HBV DNA into hepatocyte genome is an invariable consequence of HBV infection, the question, when the first virus-host DNA fusions are formed, which had been asked for decades by others and us, remained unanswered [16,44,45] . However, with recent availability of HBV-susceptible human hepatocyte-compatible cultures, including cells overexpressing sodium taurocholate co-transporting polypeptide (NTCP) serving as a HBV receptor and with access to enhanced approaches capable of detecting and sequencing genomic junctions with a high sensitivity and consistency, answering such a question became more realistic [46][47][48][49][50] . To identify the time kinetics of formation of the first (also called initial or earliest) HBV-hepatocyte genomic fusions, mechanisms of their creation and the nature of the host's sites involved, we explored human hepatocyte-like cells and cultured woodchuck hepatocytes susceptible to authentic (also termed as native, wild-type, or naturally occurring) HBV or WHV, as well as HepG2 cells overexpressing NTCP infected with recombinant HBV [27,51,52] . We also analyzed WHV-host integrations in the woodchuck model by examining liver biopsies collected at 1 h or 3 h after infection with WHV [51] . The principal approach to identify virus-host genomic fusions in our studies was virus genome-specific inverse-polymerase chain reaction (inv-PCR), the sensitivity and specificity of which to detect viral integrants was enhanced by nuclei acid hybridization (NAH) via Sothern blot analysis with probes containing complete HBV or WHV sequences. Only amplicons displaying virusspecific signals were subsequently cloned and sequenced, and the virus-host fusions and the host's genes involved were identified with help of specialized software. The current review summarizes these studies, as well as relevant works from other groups, to provide an overall perspective on the earliest time of the appearance and the nature of the initial HBV-host genome integrations, currently known mechanisms of their formation, and their potential biological significance.

THE EARLIEST MOLECULAR MARKERS OF HBV AND ITS REPLICATION IN HEPATOCYTES
The earliest molecular markers of HBV replication can broadly be divided into two categories, direct and indirect. HBV covalently closed circular DNA (cccDNA) and virus transcriptional templates (mRNAs) are direct indicators of active replication, while detection of HBV DNA is a sign of virus presence and its possible propagation. The appearance of viral protein in de novo infected cells is also considered as a sign of initiation of hepadnaviral replication.
Based on the recent in vitro experiments, the first appearance of HBV cccDNA was shown as early as 24 h p.i. in HepG2-NTCP-K7 cell clone and its level peaked at Day 3 and only discreetly increased during the 45-day follow-up [53] . In the same study, HBV mRNA increased from 3 days p.i. and plateaued at 6 days p.i., while viral proteins showed the same profile of increasing levels. Another study in HepG2-NTCP cells demonstrated presence of intracellular protein-free relaxed circular HBV DNA as early as in 12 h p.i., well before the detection of cccDNA at 2-3 days p.i. [54] . In yet another recent work investigating HepG2-NTCP cells, HBV DNA became detectable at 6 h p.i., whereas cccDNA at 24 h p.i. [55] . By using assay detecting HBV cccDNA by inv-PCR followed by NAH analysis of amplicons, the appearance of cccDNA was reported at 16 h p.i. in HepG2-NTCP cell line by another group [56] . Furthermore, it has also been recently shown that 3.5-Kb HBV mRNA can be detected at 18 h p.i. when examining de novo infected primary human hepatocytes and HepG2-NTCP-A3 cell clone [57] . In our study of the timeline of formation of the earliest HBV-host DNA integration sites in HepaRG cells investigated applying high sensitivity PCR/NAHbased assays, HBV DNA and its RNA transcripts became detectable from 1 h p.i., while HBV cccDNA from three days p.i. onwards [51] [ Figure 1].
The model infections with other hepadnaviruses showed comparable timelines of the first detection of viral DNA and its replication intermediates in the early stages of infection to those mentioned above. Thus, infection with duck hepatitis B virus (DHBV) of duck hepatocytes documented generation of virus pre-genomic RNA from 12 h p.i., which peaked at 20 h p.i., and was followed by detection of DHBV core protein 12 h later [58] . In livers of Peking ducklings inoculated with DHBV, the appearance of supercoiled virus DNA was observed at 6 h p.i., whereas that of viral 3.5-, 2.7-, and 2.4-Kb mRNA transcripts at 12 h p.i., which was shortly after followed by detection of single-stranded DHBV DNA. In addition, prominent increases in DHBV RNA in liver tissue was reported between 12 h and 72h p.i. [59] . In the early stage of WHV infection, WHV DNA and mRNA became detectable in liver biopsies obtained at 1 or 3 h p.i. using PCR/NAH assays, which remained virus WHV cccDNA negative at these time points. WHV cccDNA became detectable in the subsequent liver biopsies collected at six weeks p.i. [51,60] . In woodchuck WCM260 line derived from primary hepatocytes isolated from a healthy animal [61,62] , quantifiable levels of WHV DNA became detectable from 6 h p.i., while its unquantifiable signals were seen from 30 min p.i. by realtime PCR (qPCR) and NAH analysis of the resulting amplicons [52] .

TIME OF THE FIRST APPEARANCE OF HEPADNAVIRUS-HOST GENOMIC MERGES
Definition of time of the appearance of a given virus-host junction in our studies was based on the time in minutes (min), hours (h), or days which lapsed between the first contact of cell or animal with virus inoculum and the detection of junction. Based on this, the integrations were categorized to three groups following the refined scheme previously applied [51] . Thus, the junctions found up to 24 h p.i. were designated as very early integration site (VEIS), those after 24 h and until 72 h p.i. as early integration sites (EIS), and those beyond 72 h p.i. as late or not-early integration sites (NEIS). In addition, the first or earliest virus-host genomic fusions were also called initial integration sites (IIS).
Our original study explored human hepatocyte-like HepaRG cells de novo infected with wild-type HBV [51] . The IIS became detectable at 1 h after exposure to virus and HBV DNA integrations into five different host genes were identified [ Table 1]. At the same time, HBV DNA and its transcripts became detectable [51] . Other time points investigated in this study were three and seven days, and two, four, and seven weeks p.i. No integration signals were detected up to 1 h p.i., as well as in control cells collected at time 0 or those after mock infection with normal human plasma (NHP). Overall, 9 HBV-host DNA fusions were classified as VEIS, 5 as EIS, and 11 as NEIS. There was a weak trend towards an increase in the number of HBV integrations over the time examined. In the same study, woodchuck liver biopsies collected prior to i.v. infection with WHV and at 1 h or 3 h p.i., and at 6 weeks p.i. were examined for virus-host genomic junctions using WHV-specific inv-PCR/NAH. The biopsies collected at 1 h p.i. revealed the presence of WHV DNA fusions with multiple (n = 8) host genes spreading across different chromosomes [ Table 1]. Overall, there were 10 sites classified as VEIS in biopsies acquired at 1 h or 3 h p.i. and 7 as NEIS in biopsies acquired at six weeks p.i. Thus, the time of the appearance of the first virus-host merges was the same in cultured HepaRG cells infected with HBV and in woodchucks infected with WHV.
To advance characterization of the human genomic sites forming initial fusions with HBV, we further examined HBV integrations into genome of HepG2-C4 cell clone stably transfected with NTCP (HepG2-NTCP-C4) in the subsequent study [27] . These cells have shown a high susceptibility to HBV infection and capability of an efficient production of infectious HBV virions [27] . As an inoculum, HBV genotype D secreted by HepG2.2.15.7 cells was used. Infected HepG2-NTCP-C4 cells were investigated for HBV-host junctions at 15 min and 30 min, 3 h and 24 h, and 13 days p.i. The initial virus insertional sites (i.e., IIS) into two different genes were identified 30 min after exposure to HBV [ Table 1 and Figure 1]. In general, from the 15 integration sites detected across nine chromosomes, six were identified as VEIS and nine as NEIS [ Table 1]. This study for the first time showed that the first fusions of HBV DNA into host's genome could occur as early as 30 min after contact with virus. They also suggested that in vitro infection of highly prone cells by recombinant HBV could be superior over natural infection with wild-type virus in enabling hepadnavirus-host genomic merges.
Another research group also investigated timeline of HBV integration [63] . The in vitro models utilizing recombinant HBV and primary human hepatocytes or hepatocyte-like HepaRG, HepG2, and Huh7 cells transfected with NTCP provided evidence of HBV integration into numerous sites of the host's genome between one and nine days after infection. The first junctions were detected at the first time point investigated in Huh7-NTCP cells, i.e., at 24 h p.i. In addition, the authors used the HBV entry inhibitor Myrcludex B (MyrB), which likely targets HBV hepatocyte NTCP receptor, and showed that HBV DNA integration can be blocked at this early time point, however interpretation was based on agarose gel visualization of the inv-PCR products [63] .
Another of our studies examined the timeline of the appearance of the earliest hepadnavirus-host integrations in cultured woodchuck hepatocytes infected de novo with wild-type WHV [52] . Although the main purpose of this particular study was to recognize a mechanism involved in the formation of the initial virus-host fusions, we also examined the timeframe when these fusions were for the first time assembled. For this study, hepatocytes were examined between 15 min and 72 h after exposure to WHV, while the presence of virus-host integrations was analyzed at 15 min, 30 min, and 1 h p.i. to focus our efforts on the events occurring at the time of initiation of infection. As controls, WCM-260 cells not exposed to WHV (time 0) and those incubated with normal woodchuck plasma (NWP) were examined. In this infection system, the IIS were detected 15 min p.i., and virus fusions were identified in four different host genomic sequences [ Figure 1]. Furthermore, four other junctions were detected at 30 min p.i. and three more at 1 h p.i. By definition, all of them belonged to the VEIS category. WCM260 cells not exposed to WHV (time 0) and those subjected to mock infection did not show integration signals. These results further modified our perception about the time required for hepadnaviral DNA to integrate into the host's genome and strongly suggested that hepadnavirus DNA integration occurs immediately after virus entry into cell, which appears even before initiation of its replication [ Figure 1]. This extremely short time period after the first contact with virus was no longer surprising when kinetics of virus-induced oxidative stress and DNA repair machinery, determined in the same infection system, became known [52] .

HBV EARLY IN INFECTION FREQUENTLY INTEGRATES INTO HOST NON-CODING DNA TRANSPOSABLE ELEMENTS
In recent studies, we identified that at the very early stages of infection HBV frequently integrates into human mobile genetic elements containing repetitive non-coding genomic sequences, such as retrotransposon and transposon elements, and into genes with translocation potential [27,51] . Thus, in the HBV-HepaRG cell infection model, HBV DNA integrations with or in close proximity to LINE1 (L1) and LINE2 (L2) were identified at 3 and 24 h p.i. [51] . These merges were sometimes evident in the majority of clones derived from the particular time point p.i., as for HBV-LINE1 fusion identified at 24 h p.i. [51] HBV DNA junctions with LINE1 or LINE2 were also detected in later time points p.i. in this model. In another study, HBV junctions with LINE1 were also detected in Huh7-NTCP cells at 24 h p.i. [63] . It is of note that over one-fifth of the human genome is comprised by DNA elements belonging to the family of LINE1, LINE2, or LINE3 (i.e., chicken repeat-1, CR1) [64] . Among these three families, LINE1 is the most abundant and represented by an estimated 500,000 copies, which overall constitutes about 18% of the human genome, while LINE2 and LINE3 make up 3% and 0.3% of the genome, respectively [65,66] . LINE1 is mostly autonomous and displays endonuclease and reverse transcriptase activity, and it transposes through the mechanism termed as target-primed retrotransposition (TPRT) [67] . LINE2 is a fossil representative and is commonly located between intronic regions of the human genome [68,69] .
In addition, HBV fusions with human satellite II DNA (HSAT-II), another retrotransposable element, were detected at later time points, i.e., 3 days and 14 days p.i. Representation of this merge was particularly strong at 14 days, as it was identified in 12 clones. The integration of HBV DNA with HSAT-II has not been described before, however HBV merges with the HSAT-III sequence were reported in a hepatoma cell line and in HCC tissue [51,70,71] . Overall, taking into account all HBV insertional sites detected in HepaRG infected with native HBV, 5 (23%) of the 22 unique integration sites uncovered were transposonable elements. Considering the number of clones carrying junctions with tandemly repeating non-coding DNA sequences, they represented 46% of all clones with virus-host merges detected (35/76) and 37.5% of those in which integration sites were classified as VEIS (9/24).
In the more efficient infection of HepG2-NTCP-C4 cells with HBV, as judged by the twice-shorter time of the appearance of the first HBV-host genomic fusions, one of two IIS detected at 30 min p.i. was a junction with short-interspersed nuclear element (SINE). The HBV-SINE fusion was represented in 17 (85%) of 21 clones carrying IIS. SINE is a retrotransposon belonging to the non-long terminal repetitive (non-LTR) category that is abundant in human genome and may significantly influence its size [72,73] . It is also expected that SINE advances oncogenic transformation [74] . Among VEIS identified in HepG2-NTCP-C4 cells, there was also HBV fusion with the retrotransposon known as the mammalian apparent retrotransposon long terminal repetitive (THE-1B-LTR) element, belonging to a mammalian apparent LTR retrotransposon (MaLR) family [75] . This junction was the only one detected at 1 h p.i. but was well represented since seven separate clones confirmed this sequence [27] . THE-1B-LTR appears to play an important role in the development of non-Hodgkin's lymphoma [76] . Since HBV DNA merged with THE-1B-LTR encompassed virus enhancer II (Enh-II), a possible pathogenic relevance of this fusion might lay in modulations of the retrotransposon activity and hepatocyte functions [27] . At one late-time point after infection, i.e., 13 days p.i., either direct or indirect merges of HBV DNA with three other elements of the retrotransposon or transposon class were identified in HepG2-NTCP-C4 cells. Hence, HBV DNA was integrated with hobo activator-18 Salmo salar long terminal repeat (hAT-18-SsA). There are about seven dozen hAT elements in humans which transpose through DNA-DNA fusions; in contrast, retrotransposons usually rearrange the genome via DNA-RNA merges [77] . Interestingly, hAT-18-Ssa was joined by non-coding sequence of chromosome-2 (CH-2) and that by medium reiterated frequency repeat 5B (MER-5B), another non-coding sequence of the transposon class. This resulted in a complex structure formed by HBV DNA and host sequence trimera. It is of interest to note that MER-5B is known to control expression of alpha fetoprotein (AFP) gene, the protein of which is plentifully displayed in human fetal liver [78] . It has also been shown that retroviral insertions may cause protracted expression of AFP as well as H19 in liver [79] . In this regard, we documented previously WHV DNA integration into woodchuck H19 gene [16] . Taken together, this may suggest a link between hepadnaviral DNA insertions and elevated AFP levels. The molecular foundation of this possible relation would require future investigations. In addition, HBV DNA-LINE2 merge was yet another fusion with retrotransposon identified among NEIS in HepG2-NTCP-C4 cells. In general, among the 15 HBV-host DNA integration sites identified in total, five (33%) were merges with transposable elements. Clones carrying these fusions comprised 41% of all clones with virus-host junctions (32/78) and 49% of those in which merged sequences were classified as VEIS (24/49). If any comparison between HBV integration profiles in HepaRG and HepG2-NTCP-C4 cells could be made, the data indicate that HBV infection in HepG2-NTCP-C4 cells was characterized by a greater proportion of virus fusions with transposable elements (33% vs. 23%) and by a larger proportion of clones carrying these fusions among the clones with integration sites identified as VEIS (49% vs. 37.5%). Considering the nature of transposable elements joined with HBV, only the LINE retrotransposon family was identified in both cell lines.
However, when the above results were compared with those from another study investigating HBV integration after de novo infection of HepG2-NTCP or Huh7-NTCP cells [63] , four of the same or similar retrotransposon or transposon elements were found. These were SINE detected at five and seven days, THE-1B-related THE-Int at five days, MER-5B similar MER52D/41A/90A/4E1/4A at seven days, and LINE1 between one and seven days p.i. In general, there was a good agreement between the findings from infection systems utilizing cells expressing NTCP and as inocula either patient-derived or recombinant HBV, which jointly further ascertained authenticity of the findings.

HEPATOCYTE GENES TARGETED FOR INTEGRATION BY HBV AND WHV IN THE FIRST 24 H AFTER INFECTION
Based on the clonal sequencing analysis or direct sequencing of virus-host junctions, many different host genes, other than genomic repetitive elements, were found to be insertional sites with which HBV DNA initially (i.e., IIS) or very early (i.e., VEIS) post-infection has fused [27,51,63] . Thus, in our study of HepaRG cells infected with authentic HBV, viral integrations into five different genes were detected at 1 h p.i. These genes were neurotrimin (NTM) located at chromosome (Ch)-11q25, acidic (leucine rich) nuclear phosphoprotein 32 family (ANP32E) on Ch-1q21.1, ribosomal protein S3A pseudogene 26 (S3A-26) on Ch-2q22.1, ankyrin3 (ANK3) on Ch-10q21.2, and fibroblast growth factor 14 (FGF14) on Ch-13q33.2 [ Table 1]. Two other genes, dihydropyrimidine dehydrogenase (DPYD) on Ch-1q21.3 and Ro-associated Y pseudogene (RNY-1) at the q36.1 locus of Ch-7, were detected at 24 h p.i. [51] . Interestingly, the profiles of these very early integrations were much different after infection with two HBV inocula. Hence, inoculum containing HBV genotype C produced initial virus-host fusions with just two genes, but their sequences were displayed in multiple clones. The two sites identified were NTM encoding a neuronal adhesion molecule at 1 h p.i. and retrotransposon LINE1, mentioned in the preceding section, at 24 h p.i. [ Table 1]. Contrastingly, the second inoculum that carried HBV genotype A generated junctions with several host genes, the sequences of which were detectable in singular clones only [ Table 1]. This observation potentially represents a valuable finding suggesting that the virus itself could predetermine the pattern of virus-host fusions.
Huh7-NTCP cells infected with recombinant HBV for 24 h also demonstrated virus-host integrations into more than one of the host genes [ Table 1]. Hence, HBV DNA fusions to long non-coding RNA gene RP11-63E9.1 (RP11-63E9.1), sorting nexin 29 pseudogene-2 (SNX29P2), and homo sapiens BAC clone RP11-98L17 (AC116618.1) were identified and their existence confirmed by direct Sanger sequencing [63] . In another study in which HepG2-NTCP-C4 cells infected with HBV were examined, virus junctions with multiple host genes were detected within 24 h p.i. and almost all of them were identified in multiple clones sequenced [27] . The first fusions became detectable in 30 min after exposure to virus and HBV DNA insertions into neuroblastoma breakpoint family member-1 (NBPF-1) gene on Ch-1p36. 13 and retrotransposon SINE at the q23.2 locus of Ch-10 were detected [ Table 1]. Parenthetically, NBPF-1 is a pseudogene encoding a tumor suppressor for neuroblastoma [80] . Other HBV DNA insertions into protein kinase cGMP-dependent type 1 (PRKG1) gene located at Ch-10q11.23 and into protein rich 16 (PRR16) gene on Ch-5q23.1 were found at 3 h p.i. and into run-related transcription factor 1 (RunX1) gene on Ch-21q22.12 at 24 h p.i. PRKG1 is a cyclic GMP-dependent protein kinase that regulates cell signaling and growth mainly in skeletal muscle and neuronal cells [81] and RunX1 plays a role in hematopoiesis and possibly in the pathogenesis of acute myeloid leukemia and HCC [82] .
Finally, woodchuck WCM260 hepatocyte line infected with wild-type WHV demonstrated virus-host joints which were identifiable from 15 min p.i. by inv-PCR/NAH followed by cloning of the WHV reactive amplicons and sequencing of the clones [52] . In total, 12 clones carrying 11 unique WHV-host DNA fusions were detected between 15 and 60 min p.i. Five of the junctions were with WHV X gene and the other five with WHV preS region sequence. Among the host sequences fused, one was identified as woodchuck olfactory receptor family 6 subfamily C member 66 pseudogene (OR6C66P) [ Table 1].
Of all the host's genes mentioned in the section above [ Table 1], NTM [83] , ANP32E [84,85] , S3A [86] , and FGF14 [87] were found in HBV-infected HepaRG cells at 1 h p.i. [51] ; MAML2 [88] and PHACTR3 [32] were detected at 1 h p.i. in liver biopsies of WHV-infected woodchucks [51] ; and NBPF-1 [89] and RunX1 [90] in HBVinfected HepG2-NTCP-C4 cells at 30 min or 24 h p.i. [27] were identified as those to be directly or indirectly linked to cellular gene translocation. Thus, while excluding non-coding retrotransposable and transposable elements detected up to 24 h p.i., such as FLRT2/L2, LINE1, RNY-1, RP11-63E9.1, SNX29P2, AC116618.1, SINE, THE1B-LTR, and 11 unidentified woodchuck genomic elements [ Table 1], the analysis showed that 8 (50%) of the remaining 16 sequences had predicted translocation potential. Taking into account both noncoding DNA transposable elements, which by their nature are mobile and prone to translocation, and the genes with the predicted translocation potential that were found in the studies discussed, the great majority of the identifiable host sequences joined by HBV or WHV were those that could translocate genomic and inserted exogenous sequences across hepatocyte genome . Their complete list included NTM, ANP32E,  S3A-26, FGF14, FLRT2/LINE2, LINE1, MAML2, PHACTR3, NBPF-1, SINE, THE1B-LTR, and RunX1, and they accounted for 12 of 19 (63%) sequences identified as hepadnaviral insertional sites. This further strengthened the hypothesis proposed in our first study on this subject [51] that HBV can engage from the beginning of infection mobile genetic elements, including genes with translocation capabilities, to prompt pro-oncogenic perturbations throughout the host genome which may compromise genome overall stability and either augment or silence expression of individual genes important to the development of HCC.

CHROMATIN MARKS ON HOST GENOMIC SEQUENCES TARGETED BY HBV INTEGRATION
Whether the hepadnaviral sequences integrated into the host genome will be able to transcribe or not depends on the characteristics of the sites with which virus DNA merged. By identifying the presence or absence of epigenetic modifications on the joined host sequences, transcriptional activity or latency of the integration sites can be predicted. In one of our studies investigating the kinetics of the formation and nature of the earliest HBV-host genome junctions [27] , in silico analysis was performed to identify histone chromatin signatures H3K4me3 (Tri-methylation of lysine 4 on histone H3) and H3K27ac (histone H3 lysine 37 demethylase), where H3K4 methylation mark suggests transcriptionally repressed sites, whereas presence of H3K27 acetylation mark is attributable to transcriptionally active state [91,92] . In addition, we tracked for CCCTC-binding factor (CTCF) clusters linked to insulator activity, enhancer of zeste homolog 2 (EZH2) sequences playing a main role in methylation by activating methyl groups, and for the presence of DNAase binding regions [93][94][95] . In the HBV infection model in HepG2-NTCP-C4 cells, in which six distinct virus-host integration sites were identified in the first 24 h p.i., two of the sites encompassing host SINE and NBPF-1 sequences were detected at 30 min p.i. and classified as IIS [ Table 1]. The analysis of these two sites revealed commonalities in the state of H3K4me3, EZH2, and DNase marks, however there were differences in the status of H3K27ac. Thus, in contrast to NBPF-1, SINE sequence exhibited presence of acetylation mark H3K27ac, suggesting that the HBV-SINE DNA merge could be in the transcriptionally active state. On the remaining four integration sites detected between 1 h and 24 h p.i., H3K4me3 mark was detected while acetylation mark H3K27Ac was absent. Accordingly, this profile implied that the sequences forming junctions with HBV DNA during this time were unlikely transcriptionally inactive.
CTCF has important regulatory functions including long-range gene activation, insulation, imprinting, and cell differentiation [96,97] . Interestingly, CTCF binding motifs were found on almost all host sequences detected up to 24 h p.i., except for PRR-16, in the study mentioned above [ Table 1]. Regarding DNase hypersensitive motifs, the data show their absence on all sequences forming sites classified as VEIS, as well as on seven of eight sequences merged with HBV identified at 13 days p.i. in the same study [27] . These results suggest that the sequence regions investigated were likely in closed conformation restricting chromatin modifications. With the advancement of recognition of functional significance of chromatin marks, the prominent role of polycomb repressive complex 2 (PRC2) has been uncovered [98] . In this regard, we analyzed presence of EZH2 mark, as EZH2 is a key catalytic subunit of PRC2 involved in gene methylation while activating methyl group [95] . Although we could not find EZH2 binding sites at the initial stage of HBV DNA integration, i.e., at 30 min p.i., we found them in in three of four host sequences merged with HBV DNA detected between 1 h and 24 h p.i. and in seven of eight sequences detected at 13 days p.i. The recent findings progressively illuminate a role of key chromatin marks that may also govern HBV driven initiation of hepatocyte oncogenic transformation culminating in liver cancer.
Regarding the above, some advances have been made in recognition of the methylation status of integrated HBV sequences. One of the studies found presence of methylation on the integrated HBV DNA in SNU-398 cells derived from HBV-associated HCC [99] . However, this mark was not found on the integrated HBV sequence in another Hep3B cell line which also expressed HBV envelope proteins [99] . Methylation of the integrated HBV DNA might be related to the methylation state of the adjacent host sequences. To investigate this, a study used a next-generation sequencing-based method for structural methylation analysis of integrated viral genomes [100] . It was uncovered that integrated HBV DNA is significantly methylated when the fused host genomic sites are already highly methylated. However, if HBV integrates into the unmethylated sites, such as promoters and enhancers, integrated HBV DNA do remain unmethylated [100] .

HBV AND WHV DNA BREAKING POINTS AT WHICH VIRUS-HOST JUNCTIONS HAS FORMED
Considering the sites of HBV and WHV genomes, which formed junctions in the first 24 h p.i., the sequences of the virus-host DNA fusions were analyzed to determine where the breaking points in viral DNA occurred and if they were clustered in particular regions. However, there is not yet a study based on the whole hepadnavirus genome analysis, and only some parts of hepadnaviral sequence, particularly the X gene, were investigated using sensitive inv-PCR-based methods. Therefore, it is likely that other regions where virus DNA breaks emerge capable of forming fusions with host sequences will be found when more thoughtful analyses of the earliest stages of infection become feasible.
Using the HBV-HepaRG cell infection system [51] , the majority of HBV-host junctions were formed between the HBx DNA gene sequence comprising the enhancer-II (Enh-II) and basal core promoter (BCP) regions located between nucleotides (nt) 1246-1829 (nucleotide positions according to HBV DNA GeBank X70185 and AB033556 sequences) [51] . Within this fragment, 91.6% (22/24) of all DNA breaking points identified within the HBx 1603-1829 sequence were found. Further, 25% (6/24) were confined to Enh-II between nts 1659 and 1739, while 41.6% (10/24) to BCP between nts 1764 and 1829. This showed that 66% (16/24) DNA breaks were within the HBx gene sequence overlapping HBV regulatory elements BCP and Enh-II and hence these elements appeared most prone to breaks that formed junctions with host genome. The breaking points in the BCP were further divided into two clusters. The first cluster spanned nts 1764-1808 that contained the HBV TATA-like binding sequences (TA2-TA4) between nts 1758 and 1795 and precore mRNA initiation sites between nts 1788 and 1795. Six breaking points were identified in this region, including one with six hits at position 1764. The second cluster encompassed nts 1816-1829 and contained the HBV pre-genomic RNA initiation site at nt 1818 [101] , including one HBV DNA breaking point found at this position in this sequence. In addition to the mentioned findings, this study showed that enumeration of the HBV DNA breaking points according to the virus genomic regions is feasible as well as validated the methodology used, which included clonal sequencing of the inv-PCR/NAH products displaying HBVspecific signals.
By analyzing the data from Huh7-NTCP cells infected by HBV, which were reported by another group [63] , we found that of the four VEIS identified at 24 h p.i. the HBV fragment spanning nts 1790-1809 was fused with retrotransposon LINE1, HBV 1793-1822 nt sequence with host RP11-63E9.1, HBV 1760-1790 nt sequence with host SNX29P2, and HBV fragment between nts 1698 and 1726 with host AC116618.1 [ Table 1]. The cumulative size of the HBV fragment with DNA breaking points engaged in joining with Huh7-NTCP cell genome was nts 1698-1822. Furthermore, in our study applying HepG2-NTCP-C4 cells as HBV infection targets [27] , six different host genes or genetic elements were fused with HBV from 30 min to 24 h p.i.
[ Table 1]. The HBV DNA breaking points forming these junctions were located between nts 1647 and 1945. All nucleotide positions were enumerated according to the same HBV reference, as cited previously [51] . Considering the above findings, the great majority of the DNA breaking points found fused with Huh7-NTCP and HepG2-NTCP-C4 cell genomes were within the same HBV genomic region as the fusions identified in HepaRG cells. Specifically, HBV Enh-II and BCP sequences appeared to be most prone to the DNA breakages.
In the woodchuck infection model, WHV DNA breaks engaged in formation of junctions with host genomic sequences in the first 3 h p.i. were predominantly located in the WHx gene in the sequence between nts 1853 and 1876 containing the virus BCP region [51] . This WHV DNA fragment created fusions with five different VEIS. However, since we also applied invPCR/NAH with primers specific for WHV preS genomic region, it was possible to identify host fusions with nucleotides within this and the downstream P gene sequence. The data show the WHV DNA preS breaking points forming fusions with the host's sites classified as VEIS were located between nts 3300 and 3309 (nucleotide positions enumerated according to GenBank sequence AY334075).
In yet another model, an in vitro infection of WCM260 hepatocyte line was applied to recognize the mechanism of hepadnaviral integration in the initial infection. In this study, WHV DNA integrations were uncovered as early as in 15 min p.i. The WHV DNA breaking points were found in the WHx gene sequence of nts 1360-1934 and in the sequences predominantly spanning the preS1 region of nts 2904-3308.

MOLECULAR FORMATS OF THE VERY EARLY HEPADNAVIRUS-HOST GENOMIC JUNCTIONS
The recent data from analyses at the single nucleotide resolution level demonstrate that the HBV-host or WHV-host DNA fusions created in the first 24 h p.i. were mostly of the head-to-tail (HTJ) type, while overlapping homologous junctions (OHJ), also termed as micro-homology overlapping junctions (MHOJ), were rarely detected [ Figure 2] [27,51,52,63] . Thus, among 11 different HBV insertional sites identified in HepaRG cells as VEIS, only one was of the OHJ type and the remaining were HTJ [51] . In the same study, analysis of liver biopsies obtained from woodchucks at 1 or 3 h p.i. showed that all (10/10) WHV-host junctions detected had the HTJ format [ Figure 1 and Table 1]. The HJT format of the earliest HBV-host fusions was also apparent in Huh7-NTCP cells examined at 24 h p.i. [63] . In the subsequent study of the HepG2-NTCP-C4 cell clone infected with HBV, six virus-host junctions were detected between 30 min and 24 h p.i., and, except for the single fusion with RunX-1 site, all others were formed by HT joints [ Table 1] [27] . Finally, WCM260 hepatocyte line infected with WHV demonstrated 12 virus insertions into hepatocyte genome which were identified between 15 min and 1 h p.i., and all of them were of the HTJ type [52] . Therefore, among 43 virus-host genomic fusions identified in total during the first 24 h p.i. in all four studies discussed, only two (4.6%) of the merges were formed by micro-homology overlapping joining. This clearly showed that both HBV and WHV DNA integrate into hepatocyte genome in the initial stages of infection almost exclusively via HT joining. The creation of these joints is a strong indication that the non-homologous end-joining (NHEJ) pathway was involved in their formation.

MECHANISM OF INITIAL HEPADNAVIRUS INTEGRATION INTO HOST GENOME
Guided by the finding that the great majority of the very early HBV-host fusions were of the HTJ type, which implied their formation via the NHEJ pathway, and knowing that this pathway is primarily involved in repair of cell double-stranded DNA breaks [27,51,52] , which are considered to be precursors for HBV DNA integration [102] , we searched for a possible explanation connecting these two events. This task was relatively straightforward since several studies have shown that HBV infection can induce oxidative stress by prompting intracellular production of reactive oxygen species (ROS) and reactive nitrogen species (RNS) leading to DNA oxidation and oxidation-caused DNA breakages [103] . To explore if activation of this mechanism in fact occurred and could contribute to the creation of the very early virus-host DNA fusions, we examined woodchuck WCM260 hepatocytes infected with WHV beginning from 15 min p.i. [52] . Remarkably, a strong and protracted induction of ROS and transient generation of iNOS, coinciding with microscopically detectable DNA damage, became evident at 15 min after exposure to virus [ Figure 1]. While ROS reactivity progressively increased for up to 6 h p.i., reactivity of iNOS was only elevated until 30 min p.i. In addition, cellular DNA damage, as assessed by the nuclear tail moment length using alkaline comet assay, radically increased during the time investigated, i.e., from 15 min to 1 h p.i. [ Figure 1]. This suggested that ROS played a primary role in triggering DNA breakages immediately after exposure of WCM260 cells to WHV. Further, as already indicated in the preceding sections, the first WHV-host DNA  Table 1]. In this context, it has been previously shown that HBV infection increases levels of ROS and iNOS in human hepatoma HepAD38 cells, which started at the first time point observed at 24 h p.i. and peaked at 72 h p.i. during the 96-h observation period [104] . The induction of oxidative stress coincided with augmented expression of genes encoding proteins associated with response to oxidative and metabolic stresses, as well as heat shock proteins [104,105] . Considering other viruses with which a similar concept was tested, infection of hepatocyte-like HepG2 cells and J774 cells with Mayaro arbovirus showed increases in activity of the known markers of oxidative stress, including ROS, total superoxide dismutase, and malondialdehyde, within 1 h of infection [106] .
To ascertain the repair of DNA breakages caused by oxidative stress was in fact involved in generation of very early virus-host DNA integration, transcription of poly(ADP-ribose) polymerase 1 (PARP1), which plays a central role in recognition of double-strand DNA breaks and in their repair via the alternative NHEJ pathway, and transcription of X-ray repair cross-complementing protein 1 (XRCC1), which is the binding partner of PARP1, were examined in WHV-infected WCM260 hepatocytes [ Figure 3] [52] . In addition, kinetics of nicotinamine adenine dinucleotide (NAD + ), an indicator of PARP1 activation; heme oxygenase-1 (HO1), a marker of pro-oxidative stress; 8-oxyguanidine DNA glucose 1 (OGG1), an indicator of response to oxidative damage; and PARP1 cleavage were evaluated in this model. The results have shown the time-synchronized induction of the PARP1 and XRCC1 genes accompanied by significantly upregulated activity of NAD + and HO1, which all occurred in 15-30 min p.i., while PARP1 cleavage and OGG1 gene expression became significantly augmented at 5 h and 6 h p.i., respectively [ Figure 1]. The results of these

A C B
complementary quantitative measurements strongly indicate that WHV is an instant and very potent inducer of oxidative stress in the cells tested and that the PARP1/XRRC1-initiated NHEJ DNA repair machinery is involved in creation of the initial WHV-host genomic junctions.
However, the kinetics of PARP1 transcription after activation and a progressive increase for up to 12 h p.i. subsequently subsided and leveled at approximately the same levels as in uninfected WCM260, as it was evident up to the end of the 72-h observation period [52] . In addition, the decline in PARP1 gene expression coincided with an increase in PARP1 protein cleavage that was significantly augmented between 30 min and 12 h p.i. [ Figure 1]. This together suggested that the PARP1-dependent DNA repair may operate chiefly in the initial stages of infection. This also raised a possibility that de novo infection might be the main trigger of creation of virus-host fusions via the NHEJ mechanism. Interestingly, it has been shown that HBx protein can bind to PARP1 protein and inhibit PARP1 enzymatic activity and repair of DNA breakages [107] . The study suggested that there is a physical interaction between PARP1 and HBx proteins and that this may interfere with the recruitment of the DNA repair complex. In this regard, HBx RNA and HBx protein have been identified in infected cells from 4 h p.i. onwards after HBV infection [108,109] . It was also uncovered that HBx protein could play a role in initiation of oxidative stress upon HBV infection [110] . This was confirmed following transfection of HepG2 cells with HBx protein which induced ROS and activated the oxidative stress pathway within 48 h p.i. [108,111] . Similar evidence came from ChangX-34 cells infected with HBV [112] . This particular study also showed that ROS production had led to intracellular accumulation of HBx protein, which might be of a pathogenic significance in liver disease [112] . Thus, the data accumulated indicate that formation of the very early virus-host DNA fusions via the PARP1-initiated DNA repair mechanism is a multifactorial process, which could be potentially influenced by HBx protein.
In summary, recent studies offer the first recognition of the mechanistic aspects of formation of the initial and very early HBV DNA insertions into human hepatocyte genome. In contrast to previous notions, this happens in minutes after the first contact of virus with hepatocyte. The results imply the central roles for the virus-prompted breakages of cellular DNA caused by virus-triggered oxidative stress and the consequential activation of the PARP1/XRCC1-mediated DNA repair machinery. Finding that the very early virus-host DNA junctions have predominantly HT format well collaborates with the direct link between the PARP1 recognition of DNA damage and the NHEJ pathway of DNA repair. The involvement of virus X protein in the formation of the earliest virus-host merges remains uncertain, but this protein has potential to limit PARP1 activity for several hours post-infection.

CONCLUSIONS
The ability of HBV DNA to integrate into human hepatocyte genome was recognized from the beginning as a characteristic of this virus' biological nature, nonetheless the time when this occurred remained unknown for decades until recently. Historically, it was thought that HBV might integrate not earlier than when chronic hepatitis B is established, hence in months after infection. This unsolved issue prompted our and another group's interest in examining emergence of virus-host genomic fusions in the earliest stages of infection. Human hepatocyte-like cell lines infected with HBV, woodchucks experimentally infected with WHV, and a woodchuck hepatocyte line infected with WHV were utilized as infection models. In our studies, virus-specific inv-PCR amplification was adopted and supplemented with NAH for detection of amplicons displaying virus specific-host DNA fusions, which was followed by their clonal sequencing to identify the joined viral and host sequences.
The data show that HBV integrates into hepatocyte genome as early as 30 min after infection of HepG2 cells overexpressing NTCP and in 1 h when HepaRG cells served as infection targets. Another group reported a similar finding for HBV-infected Huh7-NTCP cells. Importantly, such immediate viral DNA insertions were also evident in WHV infection in woodchucks when their liver biopsies were analyzed in 1 or 3 h p.i. Further, we uncovered that host non-coding DNA elements, such as retrotransposons, particularly those belonging to the LINE family, transposons, and genes with transposable capabilities were prevailing targets for HBV and WHV DNA integration in the very early stages of infection. They represented overall more than 60% of all insertional sites detected up to 24 h p.i. in four independent studies. These data suggest that HBV can engage from the beginning of infection mobile genetic elements and genes with translocational potential to prompt pro-oncogenic perturbations throughout genomes of infected cells. These perturbations could compromise cell genome stability and augment or silence expression of individual genes important to the development of HCC, and possibly other HBV infection-associated liver and extrahepatic cancers. In addition, a variety of host genes encoding physiologically vital proteins was identified as the very early fusion partners for HBV or WHV DNA.
To recognize a mechanism facilitating formation of the first virus-host DNA merges, our approach was based on the observation that the great majority of the initial and very early fusions were created by nonhomologous end joints, also called head-to-tail joints (HTJ), and just very few by overlapping homologous end joining. This implied that their formation involved the NHEJ PARP1/XRCC1-dependent DNA repair pathway and that oxidative stress could be a culprit in this process. In fact, analysis of kinetics of ROS and iNOs levels, cellular DNA damage, and expression of genes associated with oxidative DNA damage response in the WHV-WCM260 infection model demonstrated that all of them significantly increased at the time of formation of the first virus-host junctions at 15-30 min p.i. The results show that the virus is an immediate and very potent inducer of oxidative cell DNA damage and swiftly triggers DNA repair machinery that with a high probability facilitates formation of the earliest virus-host fusions.
The initial and very early HBV-host genomic junctions likely set the stage for the following pro-oncogenic perturbations resulting in HCC. Recognition of their profiles, considering both viral and host sequences involved, frequency of their occurrence, and distribution throughout the infected liver should bring new insights into understanding the pathogenesis of pre-cancerous changes, predict dynamics of the oncogenic process, and prompt ideas regarding novel biomarkers and therapeutic approaches either slowing down or inhibiting progression of these transformations.

Authors' contributions
Made substantial contributions to the concept, design and writing of the article: Chauhan R, Michalak TI

Availability of data and materials
Not applicable.