Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation

Lorenzo Ruggeri
Manuel Tognon
Rosalba Giugno

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Transcription factors (TFs) regulate gene expression by binding to short, specific DNA sequences, known as transcription factor binding sites (TFBSs). Accurate identification of TFBSs is fundamental for znderstanding transcriptional regulation. By leveraging their ability to capture complex non-linear patterns and hierarchical dependencies underlying TF-DNA binding deep learning (DL) has emerged as the state-of-the-art approach for modeling and identifying TFBSs. However, current models often require extensive pretraining, involve large parameter sets, and offer limited interpretability. To address these limitations, we introduce WaveDNA, a lightweight and interpretable DL framework that encode DNA sequences into two- dimensional representations using wavelet transforms. This approach enables the use of convolutional neural networks pretrained on images, facilitating efficient transfer learning without requiring large-scale genomic data pretraining. Across diverse ENCODE ChIP-seq datasets spanning different TFs, WaveDNA achieves predictive accuracy comparable to state-of-the-art DL models while using approximately fivefold fewer parameters and substantially less computational resources. Moreover, representing DNA sequences as images allows the direct application of established computer vision interpretability techniques to visualize the learned binding patterns. Together, these results demonstrate that WaveDNA offers

Version published to 10.1101/2025.11.07.687194 on bioRxiv
Nov 10, 2025

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

This article has 4 authors:
1. Hua-Lin Xu
2. Xiu-Jun Gong
3. Hua Yu
4. Ying-Kai Wang
This article has no evaluationsLatest version Dec 28, 2025
Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

This article has 5 authors:
1. Radim Krupička
2. Mariana Komárková
3. Bohuslav Dvorský
4. Kateřina Kollinová
5. Ondřej Klempíř
This article has no evaluationsLatest version Dec 23, 2025
Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

Understanding Pathways in Bioinformatics, Genomics, and Health Applications