Pathway analysis provides insight into the genetic susceptibility to hepatocellular carcinoma and insight into immuno-therapy treatment response

Clear evidence exists for genetic susceptibility to hepatocellular carcinoma (HCC). Genome-wide association studies have identified multiple candidate susceptibility loci. These loci suggest that genetic variation in the immune system may underpin HCC susceptibility. Genes for the antigen processing and presentation pathway have been observed to be significantly enriched across studies and the pathway is identified directly through genome-wide studies of variation using pathway methods. Detailed analysis of the pathway indicates both variation in the antigen presenting loci and in the antigen processing are different in cases in controls. Pathway analysis at the transcriptional level also shows difference between normal liver and liver in individuals with HCC. Assessing differences in the pathway may prove important in improving immune therapy for HCC and in identifying responders for immune checkpoint therapy.


INTRODUCTION
Hepatocellular carcinoma (HCC), the most common form of primary liver cancer, is ranked 5th in global incidence and 2nd in mortality [1] . With the exception of East Asia, the incidence of HCC is increasing in almost all regions of the world and has doubled in the USA since the early 1980s [2] . This increase is attributable to increases in obesity and type II diabetes [3,4] . Liver cancer' s 5-year survival is the second worst among all cancers (18.1%) [5] .
In this manuscript, the role of genetic susceptibility to HCC is examined. Novel tools that evaluate genetic data using collections of genes and their interactions within biologic networks are used to identify key biologic processes driving susceptibility. The relationship of germline and somatic variation is explored. The importance of these findings is assessed in the context of current therapeutic interventions for HCC.

GENETIC SUSCEPTIBILITY TO HCC
In contrast to other common tumors, genetic susceptibility to HCC remains poorly characterized. Studies have identified evidence for familiality of HCC, over and above familial exposures such as HBV infection [9][10][11][12][13][14] . For example, after accounting for HBV infection, individuals with a family history of HCC have a rate ratio of 2.4 [10] . To date, these studies have examined only hepatitis virus associated HCC and have yet to explore the role of obesity and diabetes related susceptibility.
A limited number of studies have been conducted to identify the loci underpinning this familiality. Original studies focused on candidate genes whose observed single nucleotide polymorphisms (SNPs) could plausibly modify known environmental risk factors for HCC including aflatoxin, alcohol, or tobacco. A meta-analysis of these studies found associations with 5 genes HFE, IL-1B, MnSOD, MDM, and 2UGT1A7 [15] .
HCC has had a small number of genome wide association studies (GWAS) conducted with modest success in identifying risk loci. The NHGRI-EBI Catalog lists a total of 11 studies that have identified 22 loci [16] . These studies examine East Asian populations and have included HCC associated with hepatitis B virus (HBV), hepatitis C virus (HCV), and non-alcoholic steatohepatitis (NASH) etiologies. The studies have identified SNPs in the genomic proximity (intronic, upstream and/or downstream) of twenty protein coding loci.
Clues to the biologic basis of HCC susceptibility across GWAS studies can be identified by looking for nonrandom enrichment. Using the resources of the Gene Ontology consortium (GO) (http://geneontology. org), the twenty protein coding loci were examined for biologic process enrichment in Homo sapiens. This enrichment analysis uses the tools of Panther (http://pantherdb.org/webservices/go/overrep.jsp). Four high level GO processes were observed to be significantly enriched "T cell receptor signaling pathway" (P = 0.0366), "interferon-gamma-mediated signaling pathway" (P = 0.0026), "T cell costimulation" (P = 0.0020), and "antigen processing and presentation of exogenous peptide antigen via MHC class II" (P = 0.0001).
We have previously looked for inherited susceptibility using genome-wide genotyping and a novel analytic approach that uses biologic networks -Pathways of Distinction Analysis (PoDA) [17] . In PoDA, the network is the unit of analysis and accounts for interactions among features within the network. In this analysis "antigen processing and presentation" was identified as having significant differences in variability in a population of Korean HBV associate HCC cases and controls. Consistent with the results of the enrichment analysis, re-analysis of this dataset with an extended set of 1200 pathways again identified "antigen processing and presentation", but also "interferon gamma signaling", "TCR signaling", and "T cell receptor signaling pathway" [ Table 1] suggesting that immune response may be a key driver of HCC susceptibility.

THE ROLE OF ANTIGEN PROCESSING AND PRESENTATION IN HCC
To assess what might be the key factors within "antigen processing and presentation", we performed analysis utilizing a modified version of PoDA using the Korean HCC dataset. In this analysis, all 400 of the SNPs genotyped in the data set for the 65 genes in the pathway were contrasted in the cases and controls. After assessing significance of the odds ratio for the entire set of SNPs, each individual SNP was removed one at a time from the dataset and the significance was re-assessed. The SNP which least affected the significance of the odds ratio was then removed and the process was repeated. SNPs were progressively removed in this "stepdown" procedure until the significance of the odds ratio was no longer improved. Interestingly, it was observed that initial removal of SNPs substantially improved significance of the difference between cases and controls. When stepdown was completed, a total of 49 SNPs in 26 genes were observed [ Table 2].
While the genes identified included key genes seen in the GWAS catalog, specifically members of HLA class II, other genes associated with antigen processing were also observed [ Figure 1]. The design of Genomewide association studies does not permit the specific etiologic effects of the variation. By design, the variation used in the studies is not chosen for function, but instead the ability to test differences between populations. The high linkage disequilibrium observed between variations in humans further complicates the capacity to interpret the molecular mechanisms of action.
Nevertheless, this study identifies variation of genes of potential significance in etiology. Of particular interest are the proteasome (HSPA2, HSPA4, HSPA5 HSP90AB1), endoplasmic reticulum TAP1, TAP2, CANX), and exosome (LGMN) genes associated with the processing of antigens so that they may be presented by HLA loci. The pathway also identifies genes on the surface of immune cells -NK cells (KIR2DL3, KIR2DL4, and KIR2DL5) and CD4 T cells (CD4) that may compromise immune surveillance and regulation.
It is possible to examine the intra-pathway associations of the variants. Using the analytic tool PLINK [18] , one can estimate the association (r 2 ) between loci in cases and controls [ Table 3]. As expected by the PoDA analysis, variants within the pathways are associated with one another. Both variants within loci and between loci are observed to be associated. Interestingly, the magnitude of associations differs between cases and controls. This confirms that the pathway utilizes information (interactions between loci) that would not be observed in simple single locus GWAS assessments.

"ANTIGEN PROCESSING AND PRESENTATION" TRANSCRIPTIONAL ACTIVITY
It is possible to assess whether the germline variation in "antigen processing and presentation" translates into functionally significant difference in normal liver when contrasted to tumor adjacent liver and HCC. This can be done by looking at the transcriptome of these tissues using publicly accessible data from the Gene Tissue Expression project (GTEx) [19][20][21] and the TCGA [8] . Data from both sources were processed with a common analytic pipeline that included realignment of sequencing reads to Hg38 [22,23] , uniform count scoring [24] and adjustment for over-dispersion [25,26] .
The scored transcript data was then evaluated using the novel pathway analysis tool PathOlogist [27][28][29] . PathOlogist utilizes the logical information contained within networks to compute network scores. By utilizing the structure of a network, in this approach the conditional state of genes determines expectations for the state of other members of the network. Two different scores are provided. The first assesses whether the activity state of the network differs. In the second, an assessment of the logical state of the network is measured as consistency. Consistency determines whether the transcription patterns follow the expected logic of the network.
Examination of the transcriptional state of "antigen processing and presentation" provides additional insight into the susceptibility findings. First, "antigen processing and presentation" activity is observed to be significantly higher in normal liver (GTEx) compared to TCGA tumor-adjacent (adjusted P < 0.0001) and tumor (adjusted P < 0.0001) while no difference is observed between tumor adjacent and tumor (adjusted P = 0.87). This suggests that individuals with HCC have a different "antigen processing and presentation" profile in both their non-tumor and tumor than normal liver.
No significant difference is observed between the consistency scores of normal liver (GTEx) and TCGA tumor-adjacent (adjusted P = 0.64) and tumor adjacent and tumor (adjusted P = 0.89b) for "antigen processing and presentation". However, significant difference is observed between normal liver and tumor (adjusted P < 0.0001). This suggests that "antigen processing and presentation" may be a target of mutagenesis in HCC.

IMMUNE CHECKPOINT THERAPY AND "ANTIGEN PROCESSING AND PRESENTATION"
"Antigen processing and presentation" may be an important mediator of treatment response for HCC. Immune checkpoint therapy is dramatically altering the cancer therapeutic landscape [30] . Checkpoint therapy targets inhibitory signals to the immune system such as CTLA-4 and PD-1/PD-L1. These treatments show promising, durable response results in previously treatment resistant cancers such as melanoma [31] and nonsmall cell lung cancer [32] . The US FDA has approved checkpoint therapy for second line treatment of HCC. Numerous studies are in progress to assess the efficacy as 1st line treatment (clinicaltrials.gov). Unfortunately only a minority of individuals respond to the treatments [33] . It is unknown what mediates response. Indicators of response include DNA mismatch repair capabilities [34] and tumor mutational burden [35] . But these have poor predictive capabilities.
For checkpoint therapy to work, an intact immune response is required. As implied from the indicators of response, the immune system must have the capacity to recognize tumor antigens as foreign. This recognition is mediated through antigen processing and presentation. Inherited variability may indicate individuals in which this capacity is compromised. Moreover, variation in these processes may indicate individual response to immune directed therapeutic interventions. In conclusion, the results of the germline variation studies suggest that immune mediating processes are polymorphic in the population and systematically different in HCC. Individuals with HCC have significantly lower activity for these processes and HCC shows alterations in the "logic" of the processing and presentation pathways. As such, it may be possible to predict response to checkpoint therapy through the evaluation of the inherited genetic state of "antigen processing and presentation". Understanding these differences may provide opportunities designing new immune checkpoint modulators and provide a rational basis for combinatorial therapy.