QuadStack: Specialized convolutional blocks enable in vivo BG4-binding motif prediction and highlight discrepancies with in vitro G-quadruplexes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
G-quadruplex (G4) prediction has been largely guided by in vitro biophysical rules, yet these models show limited agreement with in vivo measurements. Here, we present QuadStack, a deep learning model trained on a multi-study BG4-ChIP-seq compendium. QuadStack introduces two biologically grounded convolutional modules—G4Stack Convolution, which captures G/C stacking patterns, and Reverse Complement Convolution, which enforces strand-invariant representations consistent with ChIP-seq signals. QuadStack achieves strong predictive performance (AUC up to 0.94) and substantially outperforms widely used in vitro -based predictors on genomic test data. Beyond performance, our analyses reveal that BG4-associated sequence grammar is not solely governed by canonical isolated G-rich tracts, but also by patterns where G and C nucleotides are mixed. This suggests that cytosines are not simply disruptive in vivo , and raises the possibility that cytosines may play a context-dependent role or that guanines on the opposite strand contribute to the structure, which could explain the difference between in vivo and in vitro observations. Together these findings demonstrate a fundamental discrepancy between in vitro folding propensity and in vivo G4 biology, and establish QuadStack as both a predictive model and a framework for interpreting G4 formation in its native genomic context.