The role of genetic factors in HBV-related HCC: perspectives from local genetic backgrounds and clinical epidemiology

Familial clustering of hepatitis B surface antigen carriers (HBsAg) and hepatocellular carcinoma (HCC) has led to the evaluation of the role of genetics in hepatitis B-related diseases. Consistent reports indicate that the HLA-DP and -DQ loci are associated with persistent hepatitis B virus (HBV) infection. However, for hepatocarcinogenesis, existing studies have low power and conflicting data. Global single nucleotide polymorphism (SNP) data was collected from the 1000 Genomes Project and correlated with local epidemiological information. Southeastern Asia has a higher prevalence of HBsAg than Northeastern Asia; this was used in the evaluation of persistent HBV infection. The higher incidence of HCC in West Africa compared with East Africa was used in the evaluation of hepatocarcinogenesis. The allele frequencies for SNPs were significantly different between East Asians and Africans. Therefore, SNPs that have been identified in persistent HBV infections in East Asia may not be completely applicable in Africa. SNPs in NTCP, CTF19, and the HLA-DQ and -DP loci showed North-to-South allele frequency changes in East Asia. These findings confirm the role of genetics in persistent HBV infection. Some of the SNPs in the HLA loci show a trend of West-to-East allele frequency changes in Africa, indicating they may participate in hepatocarcinogenesis. Among the non-HLA related SNPs, rs2596542 in MICA shows a strong trend of allele frequency changes and is correlated with HCC incidence in Africa. SNPs in KIF1, IL-1A, and STAT4 also show, albeit with low statistical power, allele frequency trends compatible with HCC incidence. Taken together, there are strong correlations between background genetics in HLA-DP and -DQ loci with persistent HBV infection and hepatocarcinogenesis. The correlations were weak-positive in non-HLA loci. Received: First Decision: Revised: Accepted: Published: Science Editor: Copy Editor: Production Editor: Jing Yu Page 2 of 12 Tai et al. Hepatoma Res 2020;6:74 I http://dx.doi.org/10.20517/2394-5079.2020.54


INTRODUCTION
Hepatitis B is a global disease that results in an increased risk of liver cirrhosis and hepatocellular carcinoma [1,2] . A strong familial clustering of chronic hepatitis B carriers and hepatocellular carcinomas (HCC) has been well-reported [3][4][5][6] . This could be related to the high intrafamilial spread of hepatitis B virus (HBV) infection. Infection of HBV in the early stage of life will result in chronic persistent infection [4] . Throughout the course of this disease, intermittent relapsing liver necroinflammation will occur. This process is a defensive response to eliminate HBV replication and/or clear the virus. In general, by 80 years of age, half of chronic hepatitis B carriers will have suppressed HBV replication and cleared hepatitis B surface antigens (HBsAg) [7] .
However, some patients may develop liver cirrhosis as a result of repeated liver inflammation and fibrogenesis [8] . Liver necroinflammation can induce chromosomal damage [9] , and HBV is able to integrate into the human genome [10] . Such an injury to the host genome can induce chromosomal instability and promote hepatocarcinogenesis. Genetic factors associated with HCC have been considered because of its familial tendency. Many candidate genes and genome-wide associated studies (GWAS) have revealed nearly one hundred genes to be associated with chronic persistent infection or hepatocarcinogenesis [11][12][13][14] . Previous extensive and elegant meta-analysis reports have addressed these issues. However, not all of these HBV/HCC related genes have been confirmed and replicated by subsequent studies, demonstrating the difficulties in sorting HCC-associated genes. This is partly due to the fact that HBV-host interactions are not simply a genetic problem, as well as the fact that there are differences in the genetic backgrounds of study populations. Therefore, in this study, we shall try to understand HBV-related single nucleotide polymorphisms (SNPs) by performing correlations between genetic backgrounds and two well-known epidemiological datasets. The genetic backgrounds will be obtained from the 1000 Genomes Project (http://www.1000genomes.org/) [15] . Epidemiological concerns about a higher prevalence of HBsAg in southern compared to Northeastern Asia will be used in the evaluation for persistent HBV infection [16] . A higher prevalence of HCC in West Africa compared to East Africa will also be used for evaluation of hepatocarcinogenesis (World Health Organization, http://gco.iarc.fr/today). With these viewpoints, we may obtain additional information independent of the observations in reported studies about HBV-related genetic polymorphisms.

CHRONIC PERSISTENT HBV INFECTION
Hepatocarcinogenesis in chronic HBV infection is not a purely genetic disease; it depends on host and virus interactions. Persistent HBV infection is the first stage toward hepatocarcinogenesis.

HBV clearance
When humans are exposed to HBV, either acute hepatitis with spontaneous viral clearance or chronic persistent infection may develop [ Figure 1]. Both the timing and transmission route of infection are important in persistent infections. Individuals who are infected in the early stages of life and via vertical transmission are more likely to develop persistent infection [4] . In GWAS studies, HLA-DP and -DQ have been shown to consistently be associated with persistent HBV infection in East Asians [17][18][19][20][21][22][23] . However, such high-risk alleles are relatively rare in Africans [24] . Therefore, HBV infection elicited in the early stages of life remains an important mechanism of persistent HBV infection. It is independent of genetic polymorphisms in the HLA-DP and -DQ loci.

Persistent HBV infection
Chronic persistent HBV infections start with the presence of the hepatitis B e-antigen (HBeAg). During the HBeAg-positive immune tolerance phase, the HBV DNA level is high with low levels of liver inflammation [25] . Due to some unclear triggering factors, immune clearance of HBV will develop between the second and fourth decades of life in East Asians. There is a difference between Africans and East Asians regarding the duration of the HBeAg-positive period [26][27][28][29] . A rapid clearance of HBeAg before puberty can be found in Africans but rarely occurs in East Asians [30,31] . This difference may be associated with variations in the HLA-DP and -DQ genotypes [24] . Certain SNPs in HLA-DP and -DQ loci may prolong the HBV replication phase in adults of East Asian descent. A high HBV DNA level in parents will increase the risk of persistent infection in their offspring (submitted for publication).
HBV genotypes C and D are associated with a more persistent liver necroinflammation [28] . Many immunerelated genes may participate in this immune clearance phase [ Figure 1]. Patients either in the HBeAgpositive or -negative phase who are unable to suppress HBV replication may develop liver cirrhosis due to repeated liver inflammation and fibrogenesis.

HBV integration and host interaction
A prolonged HBV replication phase and liver cirrhosis increases the risk of HCC [25] . The mechanism may be related to hepatitis Bx proteins [32] , increased endoplasmic reticulum stress [9] , HBV integration [10,33] and inflammation-related chromosome damage. When humans are exposed to HBV, either acute hepatitis with subsequent virus clearance or persistent infection may develop. Those who are infected in the early stages of life, and via vertical transmission, are prone to progress to persistent infection. HLA-DP and -DQ loci are associated with persistent HBV infection in East Asians. Chronic persistent HBV infection starts with the presence of hepatitis B e-antigen (HBeAg), known as the immune tolerance stage. In this phase, Africans tend to clear HBeAg before puberty while East Asians tend to clear HBeAg between the second and fourth decades of life. About 5% of HBeAg-negative patients still present with active viral replication. Many immune-related genes participate in HBV clearance. Inability to effectively clear HBV may result in liver cirrhosis and increased HBV mutations and integrations into the human genome. All of these events and some genetic polymorphisms in the host may promote hepatocarcinogenesis. HCC: hepatocellular carcinoma Soon after HBV infection, part of the HBV genome can be integrated into the host genome. The mechanism of HBV integration is not fully understood. From the in vitro study done using the Na +taurocholate co-transporting polypeptide (NTCP) transfected hepatoma cell line, HBV integration can be detected randomly in the host genome shortly after infection [33] . This observation implies that HBV integration is independent of immune-related inflammation. These HBV integrations are mostly harmless but may produce genomic instability. After decades, some of the integrations may become more prominent.
In the presence of additional factors, such as inflammation, HBV mutations, or environmental carcinogens, a segment of the hepatocytes carrying HBV integrations may become clonal and develop into HCC. Several integration hot sites, including TERT, KMT2B, DDX11L1, CCNE1, and CCNA2, can be found more frequently in HCC than in non-HCC tissues [34] , while FN1 is commonly found in non-tumour tissues. A prolonged HBeAg phase and high viral load carry a higher frequency of HBV integrations [35] .

HBV mutants and host interaction
The wild-type HBV genome and its proteins are not directly cytopathic. Host immune responses and inflammation are induced to clear HBV during the second to fourth decades of life in East Asians. If the HBV immune clearance process is unsuccessful and prolonged, complicated HBV mutations may develop and escape immune surveillance. Through repeat necroinflammation, several mutation hot spots in the EnhII (C1653T)/BCP(A1762T/G1764A, T1753V)/PC(G1896A) regions were found more frequently in HCC [36] . In addition, some of the pre-S/S mutations or truncations may become directly cytopathic and/or carcinogenic [37] .

Global allele frequency of HBV-related SNPs in HLA loci
Based on GWAS studies, HLA-DP and -DQ loci are associated with persistent HBV infection. These SNPs have been reported quite consistently from different centres in East Asia.
To understand the global allele frequency of the HBV-related SNPs, we collected data from the 1000 Genomes Project, with the results listed in Table 1. Although HBsAg prevalence in East Asians is as high as in Africans, the two populations did not show similar allele frequencies on these SNPs. In general, the allele frequencies of these SNPs are significantly different between East Asians and other global populations. Only 5/19 (26.3%) SNPs showed similar allele frequencies between populations from East Asia and Africa [ Table 1]. Therefore, these HBV-related SNPs in the HLA-DP and -DQ loci may not completely explain the high prevalence of HBsAg in Africa.
We suspect that the evolution of these SNPs may be related to human migration [24] . The Indo-China peninsula and southern China are mountainous and forested areas. Such geographic environments are associated with a great diversity of microorganisms, insects, plants, and animals. People who can survive in this milieu may need some adjustments to their immunity, lest they succumb to a cytokine storm after exposure to multiple unfamiliar microorganisms. Because of modified antigen presentation resulting from HLA-DP and -DQ loci, immunity may be decreased or separated into several stages to avoid the development of cytokine storm. Unfortunately, such immunity may also allow HBV infection to become chronic and persistent. HBV clearance is delayed, but clearance may finally occur several decades later. As a matter of fact, only a minority of HBsAg carriers die of acute or chronic liver disease, and around half of chronic HBsAg carriers clear HBsAg by 80 years of age [7] .

Mechanism of persistent HBV infection in HLA-DP and-DQ SNPs
Both rs3077 and rs9277535 were identified by GWAS to be associated with persistent HBV infection in Japanese patients [18] . Allele A of rs3077 and rs9277535 are associated with a higher mRNA expression than allele G [38] . The prevalence of the A allele is lower in East Asia compared to other geographic areas [ Table 1]. This may suggest that the antigen presentation and immune response of Allele G are weaker than those of Allele A. Such behaviours may favour a persistent HBV infection.
The SNP rs7756516 in HLA-DQB2 was associated with persistent HBV infection [23] . We checked potential mRNA binding using the SegalLab tool (http://genie.weizmann.ac.il/pubs/mir07/mir07_prediction. html) and found that microRNA-550 may bind to the G allele of this SNP. The binding of miR-RNA-550 decreased mRNA stability, a potential reason why the mRNA level is low and weak function of antigen presentation. The allele frequency of the G allele is 0.806 in East Asians [ Table 1], which is much higher than in other areas worldwide (0.369-0.665).

HBV-related SNPs in the HLA region among East Asians
The human migration theory was based on a geographic block on the Indo-China peninsula. After crossing this region, the ancestors of East Asians spread to Northern China, Korea, and Japan.
Northern China is generally a grassland and is associated with a lower prevalence of HBsAg than southern China [16] . With this in mind, we examined the allele frequencies in East Asian populations. We proposed that a lower HBsAg prevalence in northeast Asia could be related to genetic polymorphisms.
The allele frequencies of HBV-related SNPs in East Asia were obtained from the 1000 Genomes Project [ Table 2]. A zone in the HLA-DP and -DQ regions showed a trend of allele frequency changes according to HBsAg prevalence and geographic location. On the other hand, background genetics may explain a lower prevalence of HBsAg in North versus Southeast Asians. This observation may be due to the race differences between North-and Southeast Asia. While plausible, such trends were not found in pseudogene regions [ Table 2]. We therefore suggest that only active genes participated in the environmental evolution or adaptation. This is additional evidence that supports the role of geographic blocks in the evolution of HBVrelated SNPs in HLA regions.

Hepatocarcinogenesis in HLA loci
Hepatocarcinogenesis is a multifactorial process. There is strong evidence for the role of genetics in persistent HBV infections. However, controversy exists over genetic reports on HBV-related hepatocarcinogenesis. When we examined the global incidence of HBV-related HCC, a higher incidence of HCC could be found in West Africa versus East Africa. Based on this trend, we examined the allele frequency distribution of the reported SNPs and correlated these with HCC incidence in different geographic regions in Africa. One should be notice that the mechanism of hepatocarcinogenesis can be diverse among regions. For example, aflatoxin or other environmental factors may be important in Africans [39] . On the other hand, a long active HBV replication phase is the key factor in East Asians [26][27][28][29] .
In HLA HBV-related SNPs, five SNPs (rs2856718, rs9275572, rs3077 and rs9277341) showed a trend of West-to-East allele frequencies change in Africans (P < 0.00001; Table 3). These SNPs were mainly located in HLA-DQ and HLA DPA1 regions; the distribution of these HCC-related SNPs in HLA regions was similar to that observed in persistent HBV infection [ Table 2]. All of these SNPs, except rs9277341, were reported to be associated with a greater risk of HCC. The rs9277341 allele had a significant difference in frequency between West and East (P < 10 -7 ), but no study had examined its effect on the risk of developing HCC. Further studies regarding this may be needed. The mechanism of hepatocarcinogenesis is probably related to persistent HBV replication and repeated liver necroinflammation.

Non-HLA SNPs in relation to HBV infection and hepatocarcinogenesis
Many SNPs in non-HLA loci were also reported to be associated with HBV infection and/or hepatocarcinogenesis. The associations with HBV infection reported in non-HLA SNPs were generally weaker than those in HLA regions. However, many SNPs related to HBV persistence or carcinogenesis could not be replicated in other studies. This is at least in part due to the different genetic backgrounds across study populations. We saw significant allele frequency differences between Africans and East Asians. Only 2/20 (10%) SNPs showed a similar allele frequency between the two populations [ Table 4].

HBV-related SNPs in the non-HLA region among East Asians
Most of the persistent HBV infection-related SNPs in non-HLA regions were reported in East Asia. We examined whether these SNPs also showed geographical differences in allele frequencies between Northern and Southern regions. We found that only 2 of 20 (10%) SNPs showed significant North-to-South allele frequency trend in East Asians [ Table 5]. Among these, NTCP, a functional receptor of hepatitis B [40] , showed the highest trend (P = 2.23 × 10 -6 ).
The T allele of rs2296651 is a missense mutation [41] and is found only in East Asians [ Table 4]. There is a higher T-allele frequency in Southern East Asia (0.111-0.118) than in Northern East Asia (0.024-0.029) [ Table 5]. However, the T allele is known to protect against persistent HBV infection. A higher T-allele frequency in a region with a high prevalence of HBsAg requires an explanation. We propose that the T allele is an evolutionary mechanism to defend against persistent HBV infection in the presence of a weakened antigen presentation system.
The rs1419881 in transcription factor 19 (CTF19) shows significant allele frequency differences between the Northern and Southern regions (P = 1.08 × 10 -5 Table 5). This GWAS-identified SNP was found to be associated with persistent HBV infection in Korea [21] . This SNP was validated in China but was not associated with persistent HBV infection in the Thai population [42] . The G allele is the risk-associated allele, which showed a higher frequency in Japanese people in Tokyo, Japan (JPT; 0.5) than in Chinese Dai people in Xishuangbanna, China (CDX; 0.29). This association was inversely related to HBsAg prevalence [ Table 5]. CTF19 mainly plays a role in the transcription of genes required in the later stages of cell cycle progression. Its mechanism in persistent HBV infection is unclear. Whether it is also similar to NTCP, which is associated with an increased defensive response in people living in regions with a high HBsAg prevalence, will require future studies.

Hepatocarcinogenesis in non-HLA loci
The major histocompatibility complex class I-related chain A (MICA) was reported to be associated with HCV-related HCC [43] . In non-HLA HBV-related SNPs, only rs2596542 in MICA showed a significant trend (P = 0.00011; Table 6). Its C allele frequency is lower in West Africa than in East Africa. The C allele is protective against hepatocarcinogenesis, whereas the T allele is a risk factor [44] . These findings correlate with a higher incidence of HCC in West than in East Africa. The MICA molecule is a ligand of the natural killing group 2 member D molecule, which is involved in nature killer cell function. Some of the tumour cell may relieve soluble MICA molecules to block immune surveillance [45,46] . The rest of non-HLA-related SNPs show weak or absent West-to-East allele frequency trend. This also confirms a low power of hepatocarcinogenesis in each HBV-related SNP.
The rs17401966 SNP in Kinesin Family Member 1B is a tumour suppressor gene. It has been identified by GWAS to be associated with HCC [47] , although there is some controversy in the subsequent validation studies. A meta-analysis has revealed that the G allele is a protective allele in the Chinese population [48] . This G allele shows a higher prevalence in East Asia than in Africa (0.288 vs. 0.057, P < 0.001; Table 4). It is interesting to find that there is a trend of a higher G allele frequency in the Luhya people in Webuye, Kenya (LWK), east Africa, than in the Gambian people in Western Divisions in the Gambia (GWD), West Africa (0.096 vs. 0.04, P = 0.009852; Table 6). This is compatible with a lower incidence of HCC in East Africa versus in West Africa.
STATs pathway and associated cytokines play roles on hepatitis clearance and fibrogenesis [49,50] . The rs7574865 SNP in signal transducer and activator of transcription 4 (STAT4) is related to persistent HBV infection and HCC. The T allele shows a lower prevalence in HCC than in chronic hepatitis B [51] . There is a weak East-to-West T-allele frequency trend in Africa (0.131 in LWK and 0.071 in GWD; P = 0.05561; Table 6). However, the higher incidence of HCC in lower T-allele frequency areas supports the conclusion made by the meta-analysis.
The SNP rs16347 in the 3'-untranslated regions of interleukin-1alpha (IL-1A) carries a miRNA-122 binding site. A variant with a TGAA insertion decreases miRNA-122 binding and increases IL-1A mRNA expression [52] . The prevalence of this insertion variant is low in Southern Chinese patients with HCC. However, another study from China has not shown this result but instead associates the insertion variant with HBV genome mutants [53] . When we looked at the allele frequency in Africa, the TGAA insertion variant was lower in West Africa than in East Africa. This allele frequency correlated with the lower incidence of HCC in East Africa than in West Africa [ Table 6]. It should be noted that the function of miRNA-122 is complicated and that the frequency of this insertion variant in East Asians (0.704) is much higher than in Africans (0.199; Table 4). If this SNP is associated with HCC, then it will have a significant impact in East Asians.
Rs187238 is located upstream of interleukin-18 (IL-18) (-148 G>C). The G allele induces increased IL-18 mRNA expression compared to the C allele [54] . The frequency of the G allele is lower in HCC cases than in non-HCC cases. This implies that those with a stronger immunity may be able to control HBV and hepatocarcinogenesis. However, a follow-up study did not validate the allele differences between HCC and non-HCC5 cases [55] . In this review, the G-allele frequency was higher in West than in East Africa (0.248 to 0.152; Table 6), contradicting the higher incidence of HCC in West Africa. The incident epidemiology does not support rs187238 playing a role in hepatocarcinogenesis in Africans. This does not necessarily provide evidence against the association of this SNP with hepatocarcinogenesis in East Asians. Termination of HBV replication is more important in East Asians than in Africans.
Rs1048338 in PRSS23 has been identified by GWAS to be associated with HCC in China [56] . However, no other report has validated this observation. A trend of a higher C allele frequency in West compared to East Africa (0.412 to 0.273, P = 0.001384; Table 6) has also been observed. This finding is not against the association rs1048338 with hepatocarcinogenesis. However, more studies are needed to confirm its association with HBV-related HCC.

CONCLUSION
We confirm that there are significant differences in genetic background between Africa and East Asia. By correlating genetic backgrounds with clinical epidemiology, we have found that the allele frequency of HLA-DQ and -DP loci do explain a higher prevalence of HBsAg in Southeastern compared to Northeastern Asia. Some of these SNPs also showed West-to-East changes in allele frequency in Africa and are correlated with HCC incidence. For the non-HLA loci, SNPs in NTCP and CTF19 showed allele frequency trends from North-to-South in East Asians, supporting their association with fighting in persistent HBV infection. There is a strong correlation between allele frequency and HCC incidence on SNPs located in MICA and weak positive correlations in KIF1, STAT4, and IL1A. The studies concerning genetic factors and hepatocarcinogenesis are difficult since multiple factors are involved and different genetic backgrounds exist among the study populations.