QuadStack: Specialized convolutional blocks enable in vivo BG4-binding motif prediction and highlight discrepancies with in vitro G-quadruplexes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

G-quadruplex (G4) prediction has been largely guided by in vitro biophysical rules, yet these models show limited agreement with in vivo measurements. Here, we present QuadStack, a deep learning model trained on a multi-study BG4-ChIP-seq compendium. QuadStack introduces two biologically grounded convolutional modules—G4Stack Convolution, which captures G/C stacking patterns, and Reverse Complement Convolution, which enforces strand-invariant representations consistent with ChIP-seq signals. QuadStack achieves strong predictive performance (AUC up to 0.94) and substantially outperforms widely used in vitro -based predictors on genomic test data. Beyond performance, our analyses reveal that BG4-associated sequence grammar is not solely governed by canonical isolated G-rich tracts, but also by patterns where G and C nucleotides are mixed. This suggests that cytosines are not simply disruptive in vivo , and raises the possibility that cytosines may play a context-dependent role or that guanines on the opposite strand contribute to the structure, which could explain the difference between in vivo and in vitro observations. Together these findings demonstrate a fundamental discrepancy between in vitro folding propensity and in vivo G4 biology, and establish QuadStack as both a predictive model and a framework for interpreting G4 formation in its native genomic context.

Article activity feed