TRUHiC: A TRansformer-embedded U-2 Net to enhance Hi-C data for 3D chromatin structure characterization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-throughput chromosome conformation capture sequencing (Hi-C) is a key technology for studying the three-dimensional (3D) structure of genomes and chromatin folding. Hi-C data reveals underlying patterns of genome organization, such as topologically associating domains (TADs) and chromatin loops, with critical roles in transcriptional regulation and disease etiology and progression. However, the sparsity of existing Hi-C data often hinders robust and reliable inference of 3D structures. Hence, we propose TRUHiC , a new computational method that leverages recent state-of-the-art deep generative modeling to augment low-resolution Hi-C data for the characterization of 3D chromatin structures. By applying TRUHiC to real low-resolution Hi-C data from the GM12329 cell line and across other publicly available Hi-C data for human and mice, we demonstrate that the augmented data significantly improve the characterization of TADs and loops across diverse cell lines and species. We further present a pre-trained TRUHiC on human lymphoblastoid cell lines that can be adaptable and transferable to improve chromatin characterization of various cell lines, tissues, and species.