METAHIT enables comprehensive and flexible genome-resolved microbiome analysis with metagenomic Hi-C
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Metagenomic Hi-C (metaHi-C) augments shotgun sequencing with in-cell proximity information, enabling genome-resolved analysis of complex communities. However, computational tools for metaHi-C remain fragmented and rarely offer end-to-end, comprehensive analysis, and existing pipelines use only chimeric Hi-C pairs while ignoring non-chimeric reads, which often constitute a large fraction of Hi-C libraries. Here, we present METAHIT, an accessible and modular workflow that standardizes metaHi-C from raw reads to genome-resolved outputs. The pipeline provides alignment-derived, assumption-light quality checks, integrates three state-of-the-art Hi-C-based binners by consolidating their outputs into a single, non-redundant metagenome-assembled genome (MAG) set, and, for the first time, reuses informative intra-contig read pairs that other Hi-C workflows discard by identifying shotgun-like reads with an EM model on gap distances and using them for per-bin reassembly. METAHIT also supports Hi-C-guided scaffolding, focused visualizations for scaffold structure, MAG annotation, and mobile genetic element (MGE)–host interactions. Across six habitats spanning host-associated and environmental microbiomes, METAHIT increases the recovery of near-complete and high-quality MAGs relative to established Hi-C baselines, while per-bin reassembly lowers contamination and maintains completeness. Applied to a single sheep-gut long-read metaHi-C sample, METAHIT recovered 929 high-quality genomes, representing, to our knowledge, the highest species richness reported from a single sample, and revealed expanded diversity within Erysipelotrichales . In the human gut, METAHIT improved contiguity for an abundant Bacteroides vulgatus MAG via Hi-C–guided scaffolding, identified candidate novel Faecalibacterium lineages, and resolved MGE–host links involving F. prausnitzii and the novel Faecalibacterium MAG. Together, METAHIT delivers standardized, inspection-ready, genome-resolved outputs for comparative, hypothesis-driven microbiome studies across protocols and sequencing modalities.