ChromBERT: Uncovering Chromatin State Motifs in the Human Genome Using a BERT-based Approach

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Chromatin states, which are defined by specific combinations of histone post-translational modifications, are fundamental to gene regulation and cellular identity. Despite their importance, comprehensive patterns within chromatin state sequences, which could provide insights into key biological functions, remain largely unexplored. In this study, we introduce ChromBERT, a BERT-based model specifically designed to detect distinct chromatin state patterns as “motifs.” We pre-trained ChromBERT on 15-state chromatin annotations from 127 human cell and tissue types from the ROADMAP consortium. This pre-trained model can be fine-tuned for various downstream tasks, and obtained high-attention chromatin state patterns are extracted as motifs. To account for the variable-length nature of chromatin state motifs, ChromBERT uses Dynamic Time Warping to cluster similar motifs and identify meaningful representative patterns. In this study, we evaluated the performance of the model on several tasks, including binary and quantitative gene expression prediction, cell type classification, and three-dimensional genome feature classification. Our analyses yielded biologically grounded results and revealed the associated chromatin state motifs. This workflow facilitates the discovery of specific chromatin state patterns across different biological contexts and offers a new framework for exploring the dynamics of epigenomic states.

Article activity feed