Development and evaluation of a prognosis prediction model for hepatocellular carcinoma via multiomics integration and semisupervised machine learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Hepatitis B virus-related hepatocellular carcinoma (HBV-HCC) is a malignant tumor caused primarily by chronic hepatitis B virus infection and accounts for 75.09% of all HCC cases in China. However, treatment efficacy is low when the regimen is selected on the basis of the existing tumor staging system. In this study, we developed a multiomics semisupervised collaborative ensemble learning(SSCEL) framework by combining machine learning(ML), tumor microenvironment (TME) analysis, and reverse network pharmacology. Principal component(PC) dimensionality reduction was employed to establish a comprehensive model evaluation function. On the basis of this function, the optimal ensemble model scheme (decision tree(DT), AdaBoost, XGBoost, and random forest(RF)) was constructed. By introducing HistGradientBoosting technology to achieve intergenerational complementarity, a five-model integration framework was formed. This model, which is based on multiomics data integration and semisupervised learning (SSL), had an area Under Curve (AUC) of 0.91 for predicting recurrence in the multiomics training set and an AUC>0.75 for predicting recurrence in the external validation set, indicating strong stability. Compared with the existing staging systems, the framework under study mainly serves to predict the recurrence risk of HCC-HBV on the basis of small multiomics datasets. The levels of five core genes in the model (CST3,HSPH1,RAB2A,WASHC4 and PLK1) were found to be significantly associated with clinical recurrence in patients. Mainly involved in regulating the CDK5-related pathway, sensing DNA double-strand breaks, and modulating early pancreatic gene expression regulatory pathways Additionally, reverse network pharmacology analysis revealed 9 potential traditional Chinese medicine (TCM) compounds and 14 related target genes associated with recurrence signature genes. These findings provide new research directions for TCM-based HCC treatment and further exploration of recurrence mechanisms. Moreover, the prognostic model developed in this study was validated across populations with diverse features and HCC etiologies and demonstrating robust performance.

Article activity feed