Using the DNA language model, GROVER, to parse sequence and epigenetic effects on genome stability

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genome stability is shaped by DNA sequence and epigenomic context, but their relative contributions to double-strand break (DSB) sensitivity remain unclear. We show that the DNA language model, GROVER, can predict DSBs using sequence features like short DNA sequences (tokens), which we linked to DSB modulation. Another model using epigenomic features outperforms the sequence-only model, highlighting complementary and cell-type specific information in the epigenome. Integrating sequence and epigenomic data yields the best performance, demonstrating their synergy. Analyzing this model revealed that genome stability information encoded in chromatin marks H3K36me3 and DNase-seq can be learned from the sequence, but not H3K27ac or H3K9me3. Embedding histone data directly into the GROVER architecture enabled cell-type specific modeling with performance matching the full epigenome model. Our results suggest that much of the information shaping DSB patterns is already encoded in the DNA sequence itself, with the epigenome acting as a fine-tuning layer.

Article activity feed