UTR-DynaPro: A CNN–Transformer Multimodal Language Model for Decoding 5′UTR Regulatory Mechanisms
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The 5′ untranslated region (5′UTR) plays a pivotal role in controlling translation efficiency and protein synthesis. However, existing models often struggle to jointly capture local regulatory motifs and long-range dependencies while effectively integrating multimodal biological features. We present UTR-DynaPro, a multimodal language model that combines a parallel CNN–Transformer architecture with a k-mer–specific mixture-of-experts module and a dynamic fusion mechanism. The CNN branch extracts contiguous motif patterns, the Transformer branch models hierarchical long-range interactions, and the dynamic fusion gate adaptively integrates their outputs alongside multimodal features such as minimum free energy and CDS co-adaptivity. Across translation efficiency, which means ribosome loading, and expression level prediction tasks, UTR-DynaPro achieves up to 3.3%, 2.2%, and 2.4% improvements over state-of-the-art methods, respectively. Attention-based motif analysis further identifies both known and novel regulatory elements with consistent performance across cell types, offering a generalizable framework for decoding complex 5′UTR regulation and guiding the design of high-performance regulatory sequences.