Improving DNA Modeling with WaveDNA: Enhancing Speed, Generalizability, and Interpretability through Wavelet Transformation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transcription factors (TFs) regulate gene expression by binding to short, specific DNA sequences, known as transcription factor binding sites (TFBSs). Accurate identification of TFBSs is fundamental for znderstanding transcriptional regulation. By leveraging their ability to capture complex non-linear patterns and hierarchical dependencies underlying TF-DNA binding deep learning (DL) has emerged as the state-of-the-art approach for modeling and identifying TFBSs. However, current models often require extensive pretraining, involve large parameter sets, and offer limited interpretability. To address these limitations, we introduce WaveDNA, a lightweight and interpretable DL framework that encode DNA sequences into two- dimensional representations using wavelet transforms. This approach enables the use of convolutional neural networks pretrained on images, facilitating efficient transfer learning without requiring large-scale genomic data pretraining. Across diverse ENCODE ChIP-seq datasets spanning different TFs, WaveDNA achieves predictive accuracy comparable to state-of-the-art DL models while using approximately fivefold fewer parameters and substantially less computational resources. Moreover, representing DNA sequences as images allows the direct application of established computer vision interpretability techniques to visualize the learned binding patterns. Together, these results demonstrate that WaveDNA offers

Article activity feed