Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants

Gaetano Manzo
Kathryn Borkowski
Ivan Ovcharenko

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Genome-wide association studies (GWAS) have identified numerous noncoding variants associated with complex human diseases, disorders, and traits. However, resolving the uncertainty between GWAS association and causality remains a significant challenge. The small subset of noncoding GWAS variants with causative effects on gene regulatory elements can only be detected through accurate methods that assess the impact of DNA sequence variation on gene regulatory activity. Deep learning models, such as those based on Convolutional Neural Networks (CNNs) and transformers, have gained prominence in predicting the regulatory effects of genetic variants, particularly in enhancers, by learning patterns from genomic and epigenomic data. Despite their potential, selecting the most suitable model is hindered by the lack of standardized benchmarks, consistent training conditions, and performance evaluation criteria in existing reviews.

Results

This study evaluates state-of-the-art deep learning models for predicting the effects of genetic variants on enhancer activity using nine datasets stemming from MPRA, raQTL, and eQTL experiments, profiling the regulatory impact of 54,859 SNPs across four human cell lines. The results reveal that CNN models, such as TREDNet and SEI, consistently outperform other architectures in predicting the regulatory impact of single-nucleotide polymorphisms (SNPs). However, hybrid CNN-transformer models, such as Borzoi, display superior performance in identifying causal SNPs within a linkage disequilibrium block. While fine-tuning enhances the performance of transformer-based models, it remains insufficient to surpass CNN and hybrid models when evaluated under optimized conditions.

Version published to 10.1101/2025.05.19.654920 on bioRxiv
May 24, 2025

Causal splicing variants revealed by deep-learning integration of single-cell sQTL mapping under influenza infection

This article has 8 authors:
1. Liuyang Wang
2. Guinevere Connelly
3. Trisha Dalapati
4. Angela Jones
5. Benjamin Schott
6. Joseph Trimarco
7. Nicholas Heaton
8. Dennis Ko
This article has no evaluationsLatest version Jan 6, 2026
Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants

This article has 27 authors:
1. José Rodríguez-Martínez
2. Edwin Peña-Martínez
3. Shreya Sharma
4. Joshua Medina-Feliciano
5. Elise Root
6. Lois Parks
7. Marissa Granitto
8. Diego Pomales-Matos
9. Jean Messon- Bird
10. Adriana Barreiro-Rosario
11. Leandro Sanabria-Alberto
12. Alejandro Rivera-Madera
13. Jessica Rodríguez-Ríos
14. Rosalba Velázquez-Roig
15. Juan Figueroa- Rosado
16. Mackenzie Noon
17. Omer Donmez
18. Carmy Forney
19. Hayley Hesse
20. Katelyn Dunn
21. Xiaoting Chen
22. Matthew Hass
23. Lucinda Lawson
24. Matthew Weirauch
25. Leah Kottyan
26. Steven Reilly
27. Devesh Bhimsaria
This article has no evaluationsLatest version Jan 7, 2026
Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Motivation

Results

Article activity feed

Related articles

Causal splicing variants revealed by deep-learning integration of single-cell sQTL mapping under influenza infection

Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants

Understanding Pathways in Bioinformatics, Genomics, and Health Applications