omicML: An Integrative Bioinformatics and Machine Learning Framework for Transcriptomic Biomarker Identification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Introduction

Transcriptomic biomarker discovery has been a challenge due to variation in datasets and platforms, complexity in statistical and computational methods, integration of multiple programming languages, and intricacy of ML workflow to evaluate biomarkers. Standard workflows necessitate several stages (quality control, normalization, differential expression), typically executed in R or Python, resulting in bottlenecks for non-experts. Existing platforms have alleviated certain challenges by offering graphical interfaces for data loading, normalization, differential gene expression analysis, and functional analysis; nevertheless, they typically do not incorporate integrated machine learning procedures for biomarker selection.

Method

In this regard, we present omicML, an intuitive graphical user interface (GUI) that combines transcriptomic data analysis with machine learning (ML)-based classification via integrating R and Python packages/libraries. It supports both RNA-Seq and microarray data, automating preprocessing (data import, quality control, and normalization) and differential expression analysis. The tool annotates differentially expressed genes (DEGs) with descriptions, gene ontology, and pathway information and incorporates comparative analysis. Our extensive ML pipeline enables both supervised and unsupervised learning, integrates various datasets based on candidate gene signatures, standardizes and eliminates less significant features, benchmarks multiple ML classifiers with robust performance metrics (e.g., AUROC, AUPRC), assesses feature importance, develops single-gene and multi-gene predictive models, and systematically finalizes the biomarker algorithm. All functionalities are available in omicML, hence reducing the barrier for biologists without computational proficiency.

Result

In a case study, omicML identified a six-gene diagnostic model that distinguishes Mpox (monkeypox virus) infections from those caused by other viruses, including SARS-CoV-2, HIV, Ebola, and varicella-zoster. These results illustrate omicML’s capacity to discern clinically relevant biomarkers from complex transcriptome data.

Conclusion

Through the unified system, omicML ( https://omicml.org ), integrating data preprocessing, differential gene expression analysis, annotation, heatmap analysis, dataset integration, batch effect correction, machine learning approach, and functional analysis can diminish technical barriers and accelerates the conversion of expression data into diagnostic insights for clinicians and bench scientists.

Article activity feed