Transcription factor prediction using protein 3D secondary structures

Jeanine Liebold
Fabian Neuhaus
Janina Geiser
Stefan Kurtz
Jan Baumbach
Khalique Newaz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Traditional methods predict a protein as a TF if the protein contains any DNA-binding domains (DBDs) of known TFs. However, this approach fails to identify a novel TF that does not contain any known DBDs. Recently proposed TF prediction methods do not rely on DBDs. Such methods use features of protein sequences to train a machine learning model, and then use the trained model to predict whether a protein is a TF or not. Because the 3-dimensional (3D) structure of a protein captures more information than its sequence, using 3D protein structures will likely allow for more accurate prediction of novel TFs.

Results

We propose a deep learning-based TF prediction method (StrucTFactor), which is the first method to utilize 3D secondary structural information of proteins. We compare StrucTFactor with recent state-of-the-art TF prediction methods based on ∼525 000 proteins across 12 datasets, capturing different aspects of data bias (including sequence redundancy) possibly influencing a method’s performance. We find that StrucTFactor significantly (P-value < 0.001) outperforms the existing TF prediction methods, improving the performance over its closest competitor by up to 17% based on Matthews correlation coefficient.

Availability and implementation

Data and source code are available at https://github.com/lieboldj/StrucTFactor and on our website at https://apps.cosy.bio/StrucTFactor

Version published to 10.1093/bioinformatics/btae762
Dec 26, 2024
Version published to 10.1101/2024.03.14.585054 on bioRxiv
Mar 15, 2024

Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences

This article has 1 author:
1. Nnaemeka Kingsley Ugwumba
This article has no evaluationsLatest version Jan 29, 2026
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
ChIP-seq analysis reveals genes regulated by TFIIE and association of TFIIE with various pathways

This article has 1 author:
1. Serdar Baysal
This article has no evaluationsLatest version Feb 2, 2026

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability and implementation

Article activity feed

Related articles

Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences

The Evolution of the AlphaFold Architecture

ChIP-seq analysis reveals genes regulated by TFIIE and association of TFIIE with various pathways