ChromBERT: Uncovering Chromatin State Motifs in the Human Genome Using a BERT-based Approach

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Chromatin states, fundamental to gene regulation and cellular identity, are defined by a unique combination of histone post-translational modifications. Despite their importance, comprehensive patterns within chromatin state sequences, which could provide insights into key biological functions, remain largely unexplored. In this study, we introduce ChromBERT, a BERT-based model specifically designed to detect distinct patterns of chromatin state annotation data sequences. Notably, ChromBERT was pre-trained on promoter regions across a diverse range of epigenomes and subsequently fine-tuned using a dataset from multiple cell lines where RNA-seq data were available, highlighting the model’s ability to discern conserved chromatin state patterns within these regions. In addition to its predictive powers across tasks, evidenced by high AUC scores, ChromBERT provides further analysis through the incorporation of motif clustering using Dynamic Time Warping (DTW). This method enhances the model’s ability to dissect chromatin state sequence motifs, typically involving transcription and enhancer sites. The introduction of motif clustering with DTW into ChromBERT’s workflow is poised to facilitate the discovery of genomic regions linked to novel biological functions, deepening our understanding of chromatin state dynamics.

Article activity feed