“Frustratingly easy” domain adaptation for cross-species transcription factor binding prediction

Mark Maher Ebeid
Ali Tuğrul Balcı
Maria Chikina
Panayiotis V Benos
Dennis Kostka

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Understanding how DNA sequence encodes gene regulation remains a central challenge in genomics. While deep learning models can predict regulatory activity from sequence with high accuracy, their generalizability across species—and thus their ability to capture fundamental biological principles—remains limited. Cross-species prediction provides a powerful test of model robustness and offers a window into conserved regulatory logic, but effectively bridging species-specific genomic differences remains a major barrier.

Results

We present MORALE, a novel and scalable domain adaptation framework that significantly advances cross-species prediction of transcription factor (TF) binding. By aligning statistical moments of sequence embeddings across species, MORALE enables deep learning models to learn species-invariant regulatory features without requiring adversarial training or complex architectures. Applied to multi-species TF ChIP-seq datasets, MORALE achieves state-of-the-art performance—outperforming both baseline and adversarial approaches across all TFs—while preserving model interpretability and recovering canonical motifs with greater precision. In the five-species transfer setting, MORALE not only improves human prediction accuracy beyond human-only training but also reveals regulatory features conserved across mammals. These results highlight the potential of simple yet powerful domain adaptation techniques to drive generalization and discovery in regulatory genomics. Crucially, MORALE is architecture-agnostic and can be seamlessly integrated into any embedding-based sequence model.

Availability

Code is available at https://github.com/loudrxiv/frustrating .

Version published to 10.1101/2025.05.21.655414 on bioRxiv
May 26, 2025

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025
Benchmarking Reveals the Superiority of Nucleic Acid Foundation Models in Predicting lncRNA Coding Potential

This article has 5 authors:
1. Yu Yang
2. Liping Ren
3. Juan Feng
4. Yang Zhang
5. Tianyuan Liu
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability

Article activity feed

Related articles

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

Benchmarking Reveals the Superiority of Nucleic Acid Foundation Models in Predicting lncRNA Coding Potential