An Explainable AI Framework Integrating Machine and Deep Learning Models for Multi-Species DNA Functional Group Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
DNA functional group classification across species plays a crucial role in understanding genetic diversity and biological function. The increasing availability of genomic data has led to the use of machine learning and deep learning methods for identifying functional patterns within DNA sequences. However, the interpretability of these models remains a challenge in validating biological relevance. This study presents an explainable AI framework that integrates machine learning and deep learning models for multi-species DNA functional group classification. Classification of the DNA functional groups is done on Human, Chimpanzee, Dog, and a custom combined dataset integrating the three species. The DNA sequences were transformed into k-mers to capture local compositional patterns before training. After extensive hyperparameter tuning, the Multinomial Naive Bayes model achieved the highest accuracy across all datasets, outperforming other models in the study and previously reported results on the same datasets. While deep learning architectures captured longer motif dependencies, classical models showed stronger generalization across species. Explainable AI techniques including Feature Importance, Saliency maps, Integrated Gradients, GradientSHAP and Attention heatmaps were applied to identify consensus motifs that align with known genomic and regulatory regions such as CpG-rich promoters and transmembrane domain signatures. The results demonstrate that the use of having an explainable framework can enhance biological insight and reliability in multi-species genomic analysis.