Benchmarking Recent Computational Tools for DNA-binding Protein Identification

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Identification of DNA-binding proteins (DBPs) is a crucial task in genome annotation, as it aids in understanding gene regulation, DNA replication, transcriptional control and various cellular processes. In this paper, we conduct an unbiased benchmarking of nine state-of-the-art computational tools as well as traditional tools such as ScanProsite and BLAST for identifying DBPs. We highlight the data leakage issue in conventional datasets leading to inflated performance. We introduce new evaluation datasets to support further development. Through a comprehensive evaluation pipeline, we identify potential limitations in models, feature extraction techniques and training methods; and recommend solutions regarding these issues. We show that combining the predictions of the two best computational tools with BLAST based prediction significantly enhances DBP identification capability. We provide this consensus method as user-friendly software. The datasets and software are available at: https://github.com/Rafeed-bot/DNA_BP_Benchmarking .

Article activity feed