An Explainable AI Framework Integrating Machine and Deep Learning Models for Multi-Species DNA Functional Group Classification

Pratik Chakraborty
Shanthi P. B.

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

DNA functional group classification across species plays a crucial role in understanding genetic diversity and biological function. The increasing availability of genomic data has led to the use of machine learning and deep learning methods for identifying functional patterns within DNA sequences. However, the interpretability of these models remains a challenge in validating biological relevance. This study presents an explainable AI framework that integrates machine learning and deep learning models for multi-species DNA functional group classification. Classification of the DNA functional groups is done on Human, Chimpanzee, Dog, and a custom combined dataset integrating the three species. The DNA sequences were transformed into k-mers to capture local compositional patterns before training. After extensive hyperparameter tuning, the Multinomial Naive Bayes model achieved the highest accuracy across all datasets, outperforming other models in the study and previously reported results on the same datasets. While deep learning architectures captured longer motif dependencies, classical models showed stronger generalization across species. Explainable AI techniques including Feature Importance, Saliency maps, Integrated Gradients, GradientSHAP and Attention heatmaps were applied to identify consensus motifs that align with known genomic and regulatory regions such as CpG-rich promoters and transmembrane domain signatures. The results demonstrate that the use of having an explainable framework can enhance biological insight and reliability in multi-species genomic analysis.

Version published to 10.21203/rs.3.rs-7979065/v1 on Research Square
Nov 24, 2025

Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

This article has 2 authors:
1. Jesus Antonio Motta
2. Pedro David Gomez
This article has no evaluationsLatest version Jan 27, 2026
Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

This article has 5 authors:
1. Radim Krupička
2. Mariana Komárková
3. Bohuslav Dvorský
4. Kateřina Kollinová
5. Ondřej Klempíř
This article has no evaluationsLatest version Dec 23, 2025
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Convolutional Deep Learning Approach to identify DNA Sequences for Gene Prediction

Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction