Synthetic community Hi-C benchmarking provides a baseline for virus-host inferences

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Microbiomes are now recognized as key influencers of diverse ecosystems, but it is increasingly evident that viruses impose significant constraints on these microbial communities. While viromics has expanded virus genomic catalogs, identifying hosts for these viruses remains a major challenge due to the limitations in scaling for cultivation and to the uncertain reliability of in silico predictions for understudied virosphere regions. A promising recent advance, Hi-C, a proximity ligation-based method, aims to infer virus-host linkages by analyzing sequences from cross-linked virus and host genomic fragments. This approach has been applied in at least seven studies, yet its accuracy has not been systematically assessed. Here we evaluate Hi-C performance in predicting virus-host interactions using a synthetic community consisting of four bacterial strains and nine phages with known, experimentally determined, quantitative interactions. Our analysis revealed that Hi-C linkage scores used in the literature perform poorly (13% specificity, 100% sensitivity). By converting linkage scores to Z-scores and applying filtering (Z-score ≥ 0.5), we dramatically increased prediction accuracy, though at reduced sensitivity (96% specificity, 57% sensitivity). These findings provide empirical data and establish guidelines for interpreting Hi-C inferred virus-host linkages, with the aim of improving its reliability across diverse ecosystems.

Article activity feed