TFClassPredict: A Novel Deep Learning Framework for Transcription Factor Binding Site Analysis Using Evolutionarily Conserved DNA-Binding Domain Annotations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transcription factors (TFs) are proteins that regulate gene expression by binding to short specific sequences in DNA. The binding of TFs to DNA is actualized through their DNA-binding domains (DBDs). The interactions between TFs and DNA are fundamental for understanding gene regulation mechanisms, which form the basis of many cellular activities and processes. While many models have been developed to predict TF binding, there is a lack of a comprehensive model that accounts for the similarity of binding characteristics of TFs within the same DBD families. Our model, TFClassPredict, introduces a novel approach to identify transcription factor binding sites (TFBSs) based on the structural annotations of evolutionarily conserved DNA-binding domains (DBDs). By leveraging strong canonical binding patterns, TFClassPredict provides high-confidence predictions that are crucial for reliable regulatory analysis. By fine-tuning the DNABERT model, TFClassPredict was developed to classify DNA sequences across different hierarchical levels, with 6 Superclasses at the top level and 17 more specific Class-level annotations, reflecting different degrees of DBD similarity. TFClassPredict achieved high performance across both hierarchical levels, with an average AUC of 0.988, precision of 0.942, and recall of 0.906 at the SuperClass level, and an average AUC of 0.990, precision of 0.937, and recall of 0.914 at the Class-level. TFClassPredict demonstrated its ability to reveal distinct regulatory landscapes associated with cancer progression. The Class-level model is publicly available for use and can be accessed at https://gitlab.gwdg.de/hti/tfclass_dnabert .

Article activity feed