Decrypting viral dark matter through key proteins using an NLP-enhanced framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Viral sequences in diverse environments remain largely uncharacterized, impeding our comprehension of their genetic makeup, biological interactions, and potential applications. This underscores an urgent need for innovative analytical methods. Here we present the VirHost Hunter framework, which employs phage tails and lysins, bypassing the requirement for full genomes, for efficient and high-resolution host assignment. By harnessing Protein Language Models and Vision Transformers, VirHost Hunter captures protein functional homology despite sequence dissimilarity, significantly boosting prediction accuracy. In the scenario of disease-associated gut bacteria, calibrated VirHost Hunter surpassed existing methods, doubling phage host assignments, expanding taxonomic reach, and revealing new phages targeting gut bacteria, including Akkermansia and Prevotella . Therefore, we established a gut phage lysin database, enabling the synthesis of a lysin that effectively and specifically targets an obesity-inducing bacterium. VirHost Hunter's precision and scalability mark a significant leap forward in virome research and present a promising avenue for microbiome therapies.

Article activity feed