OpEnHiMR: Optimization based Ensemble Model for Prediction of Histone Modification in Rice

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Histone modifications are central to gene regulation, yet their systematic identification in plants remains limited due to the complexity of epigenomic landscapes. We present OpEnHiMR, an optimization-based ensemble learning framework for multiclass prediction of three key histone modifications, H3K4me3, H3K27me3, and H3K9ac, in rice. The framework integrates Support Vector Machines, Random Forest, and Gradient Boosting models, optimized via Ant Colony Optimization to maximize performance. Biologically meaningful features, including mononucleotide binary encoding, nucleotide chemical properties, GC content, and k-mer frequencies, were used for training after rigorous data curation and redundancy removal. OpEnHiMR achieved a classification accuracy of 77.54%, outperforming individual models and ensuring improved recall, specificity, and Matthews correlation coefficient. Model interpretability was enhanced using SHAP analysis, which highlighted critical sequence features influencing prediction outcomes. To promote community-wide adoption, a user-friendly webserver ( https://dipro-sinha.shinyapps.io/OpEnHiMR/ ) and R package ( https://cran.r-project.org/web/packages/OpEnHiMR/index.html ) were developed. OpEnHiMR thus provides a scalable, accurate, and interpretable tool for histone modification prediction in plants, advancing epigenomics research and supporting data-driven crop improvement strategies.

Article activity feed