Integrative Machine Learning Approaches to Identify and Validate Gene Biomarkers for Early Detection of Hepatocellular Carcinoma

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Hepatocellular carcinoma (HCC) is among the leading causes of cancer-related deaths worldwide, and prognosis is poor if the disease is detected at advanced stages. There is an urgent need for early diagnostic biomarkers to facilitate timely interventions. Current diagnostic methods, such as liver function tests (LFTs), alpha-fetoprotein (AFP) panels, and imaging techniques like magnetic resonance imaging (MRI) and ultrasound, lack specificity for HCC and do not provide a comprehensive prognosis. This study proposes a machine learning (ML) based approach for identifying early HCC biomarkers using RNA-sequencing (RNA-seq) data. We analyzed publicly available RNA-seq datasets from Gene Expression Omnibus (GEO), UCSC Xena, and GEO RNA-seq Experiments Interactive Navigator (GREIN). In this study, we performed various feature selection methods using ML with a Random Forest (RF) model, achieving the best performance in identifying and predicting the top most significantly important genes. Bioinformatics tools, including Search Tool for the Retrieval of Interacting Genes/Proteins (STRING), Gene Ontology (GO), DAVID (Database for Annotation, Visualization, and Integrated Discovery), the Human Protein Atlas (HPA), and the Comparative Toxicogenomics Database (CTD) were used for validation. Through our analysis, we identified six potential early-detection gene biomarkers for HCC: CDKN3, LIFR, MKI67, TOP2A, SLC5A1, and VIPR1.

Article activity feed