Classification of Human Transcription Factors Based on Their Effector Domains via Unsupervised Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

TFs combine DBDs, which anchor them to DNA, with EDs that regulate transcription through activation or repression, yet the sequence logic linking ED composition to function remains unclear. Here, we systematically define proxy regions —disordered segments adjacent to DBDs—to enable quantitative analysis of ED-like sequences across the human TF repertoire. Using a biophysically interpretable 22-feature classifier (FALK22) together with an embedding-based model (ESM), we map ED diversity and identify composition and charge-pattern signatures that correspond to regulatory activity along a disorder continuum, separating activation-from repression-associated regions. FALK22 identified classes align well with those identified from ESM while providing transparent, sequence-level features. Proxy regions near C-termini exhibit gradients that track DBD families, suggesting that EDs and DBDs might have co-evolved rather than evolved independently. These results establish proxy regions and FALK22 as a framework to connect sequence features with transcriptional activity and to generate testable hypotheses about effector-domain function and co-evolution with DNA-binding domains.

HIGHLIGHTS

  • We define proxy regions as systematically identified disordered segments adjacent to DNA-binding domains (DBD), enabling quantitative analysis of effector domain (ED)-like sequences across the human transcription factor (TF) repertoire.

  • We develop FALK22, a 22-feature classification algorithm that classifies transcription factors based on simple sequence properties of their EDs and shows strong alignment with complex embedding-based representations from the Evolutionary Scale Model (ESM).

  • FALK22 and ESM uncover distinct amino-acid composition and patterning signatures of EDs that correlate with transcriptional function, separating activation- and repression-associated regions along a disorder continuum.

  • Proxy regions located at the C-termini exhibit gradients that correspond to their DBD families, suggesting that EDs did not evolve as independent modular units but rather co-evolved with, or became selectively matched to, their DBD contexts.

Article activity feed