Identification of Splicing Sites Via Integrated Features and Support Vector Machine

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene splicing plays an extremely important role in the diversity of protein. In eukaryotic gene expression, and entirely eradication of introns and fusing of the remaining exons together is a prominent task because the exon sequence is mostly interrupted by introns. Owing to its importance in genetic engineering, it is extremely recommendable to identify these splicing sites. Through conventional experimental approaches, it becomes a difficult task, even in some situations impossible. The increasing genome sequences at an exponential rate in this area, it remained a challenge to develop a precise, reliable, and robust computational approach for fast prediction of splicing sites thus in the current study an ensemble space of Kmer, RevKmer, and Pseudo Trinucleotide composition (PseTNC) are applied to take out those features that can numerically describe the biological sample. Then these features were passed into three classification algorithms such as random forest, k-nearest neighbor, and support vector machine (SVM). After evaluation through the jackknife test, the proposed model achieved promising results of 93.92% and 96.39% for datasets S and S respectively. It has been noted that the identification performance of our current model (TargetSS) is better than the existing methods. Finally, we conclude that our proposed model for splicing site identification will be proved a useful tool for Bioinformatics, Computational Biology, Molecular Biology, and drug discovery applications.

Article activity feed