Developing Foundation Models for Predicting Viral Animal Host Range in Intelligent Surveillance

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Emerging human infectious viruses originating from animals continue to pose a persistent threat to global public health. Understanding the host range of animal viruses is crucial for identifying potential spillover pathways and mitigating the risk of future pandemics. Here, we present VirHRanger, a prediction method that integrates foundation models trained on viral genome and protein sequences, alongside genomic and protein compositional traits, viral phylogeny, and protein-protein interactions. To systematically predict the animal host range, VirHRanger incorporates host taxonomy-aware neural networks trained on a comprehensive collection of animal-virus associations spanning mammals, birds, and arthropods. Within a dataset of 4,006 virus species spanning 99 viral families, our model achieved robust performance with a micro-averaged AUROC of 0.938 across all host categories, demonstrating its effectiveness in capturing generalizable host signals from viral genetic data. On a dataset of 315 novel viruses, which are associated with key reservoir animal hosts and insect vectors, VirHRanger notably outperformed the homology-based method, exhibiting a strong generalizability to novel viruses. Furthermore, VirHRanger identified host range variations among closely related viruses within the Coronaviridae family and successfully predicted the ability of SARS-CoV-2 to infect humans and other animal hosts. These findings highlight the potential of VirHRanger to transform sequencing data into timely insights for disease control during the early stages of zoonotic outbreaks.

Article activity feed