BandHiC: a memory-efficient and user-friendly Python package for organizing and analyzing Hi-C matrices down to sub-kilobase resolution
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent advances in high-resolution Hi-C and Micro-C technologies have enabled finer-scale characterization of 3D genome architecture, but they also introduce substantial computational challenges, as the size of dense contact matrices scales quadratically with resolution, resulting in prohibitive memory demands. To address this, we developed BandHiC, a memory-efficient and user-friendly Python package for organizing and analyzing Hi-C matrices down to sub-kilobase resolution. BandHiC adopts a banded storage strategy that preserves only a configurable diagonal bandwidth of the dense contact matrix, reducing memory usage by up to 99% while maintaining fast random access and intuitive indexing operations. In addition, it provides flexible masking mechanisms to handle missing values, outliers, and unmappable regions, and supports efficient vectorized operations optimized with NumPy, thereby enabling scalable analysis of ultra-high-resolution Hi-C datasets.