METAHIT enables comprehensive and flexible genome-resolved microbiome analysis with metagenomic Hi-C

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Metagenomic Hi-C (metaHi-C) augments shotgun sequencing with in-cell proximity information, enabling genome-resolved analysis of complex communities. However, computational tools for metaHi-C remain fragmented and rarely offer end-to-end, comprehensive analysis, and existing pipelines use only chimeric Hi-C pairs while ignoring non-chimeric reads, which often constitute a large fraction of Hi-C libraries. Here, we present METAHIT, an accessible and modular workflow that standardizes metaHi-C from raw reads to genome-resolved outputs. The pipeline provides alignment-derived, assumption-light quality checks, integrates three state-of-the-art Hi-C-based binners by consolidating their outputs into a single, non-redundant metagenome-assembled genome (MAG) set, and, for the first time, reuses informative intra-contig read pairs that other Hi-C workflows discard by identifying shotgun-like reads with an EM model on gap distances and using them for per-bin reassembly. METAHIT also supports Hi-C-guided scaffolding, focused visualizations for scaffold structure, MAG annotation, and mobile genetic element (MGE)–host interactions. Across six habitats spanning host-associated and environmental microbiomes, METAHIT increases the recovery of near-complete and high-quality MAGs relative to established Hi-C baselines, while per-bin reassembly lowers contamination and maintains completeness. Applied to a single sheep-gut long-read metaHi-C sample, METAHIT recovered 929 high-quality genomes, representing, to our knowledge, the highest species richness reported from a single sample, and revealed expanded diversity within Erysipelotrichales . In the human gut, METAHIT improved contiguity for an abundant Bacteroides vulgatus MAG via Hi-C–guided scaffolding, identified candidate novel Faecalibacterium lineages, and resolved MGE–host links involving F. prausnitzii and the novel Faecalibacterium MAG. Together, METAHIT delivers standardized, inspection-ready, genome-resolved outputs for comparative, hypothesis-driven microbiome studies across protocols and sequencing modalities.

Article activity feed