“Frustratingly easy” domain adaptation for cross-species transcription factor binding prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Understanding how DNA sequence encodes gene regulation remains a central challenge in genomics. While deep learning models can predict regulatory activity from sequence with high accuracy, their generalizability across species—and thus their ability to capture fundamental biological principles—remains limited. Cross-species prediction provides a powerful test of model robustness and offers a window into conserved regulatory logic, but effectively bridging species-specific genomic differences remains a major barrier.

Results

We present MORALE, a novel and scalable domain adaptation framework that significantly advances cross-species prediction of transcription factor (TF) binding. By aligning statistical moments of sequence embeddings across species, MORALE enables deep learning models to learn species-invariant regulatory features without requiring adversarial training or complex architectures. Applied to multi-species TF ChIP-seq datasets, MORALE achieves state-of-the-art performance—outperforming both baseline and adversarial approaches across all TFs—while preserving model interpretability and recovering canonical motifs with greater precision. In the five-species transfer setting, MORALE not only improves human prediction accuracy beyond human-only training but also reveals regulatory features conserved across mammals. These results highlight the potential of simple yet powerful domain adaptation techniques to drive generalization and discovery in regulatory genomics. Crucially, MORALE is architecture-agnostic and can be seamlessly integrated into any embedding-based sequence model.

Availability

Code is available at https://github.com/loudrxiv/frustrating .

Article activity feed