Hybrid BiGRU-BiLSTM Model for Robust Gene Sequence Classification: Leveraging K-Mer Preprocessing and Comprehensive Evaluation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome sequencing is a powerful technology that decodes the unique sequence of DNA in an organism, providing detailed insights into genetic information. By identifying variations in these sequences, researchers can pinpoint the genetic causes of various diseases. However, the accurate classification of gene sequences remains a major challenge due to the complex and high-dimensional nature of genetic data. This study aims to develop an accurate gene classification model by leveraging advanced sequencing technologies. Initially, gene sequences undergo preprocessing using k-Mer techniques. The classification is then performed using a hybrid model combining Bidirectional Gated Recurrent Units (Bi-GRU) and Bidirectional Long Short-Term Memory (Bi-LSTM) networks. The hybrid Bi-GRU and Bi-LSTM model harnesses the strengths of both architectures: Bi-GRU's efficient handling of sequential data and Bi-LSTM's superior memory retention capabilities. Validation is conducted on a genome dataset comprising nine distinct sub-datasets. The individual performances of Bi-GRU, Bi-LSTM, CNN and several machine learning classifiers such as XG-Boost, Random Forest (RF), Decision Tree, k-Nearest Neighbors (KNN), Naïve Bayes, and Voting classifier were also analyzed. The hybrid model consistently achieved the highest accuracy across all nine datasets, demonstrating its efficacy in gene classification.