HAVEN: Hierarchical Attention for Viral protEin-based host iNference

Blessy Antony
Maryam Haghani
Adam Lauring
Anuj Karpatne
T. M. Murali

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

It is crucial to accurately predict hosts of viruses to understand and anticipate human infectious diseases that originate from animals. There is a lack of versatile models that handle out-of-distribution factors such as unseen hosts and viruses. We develop a machine learning model for predicting the host infected by a virus, given only the sequence of a protein encoded by the genome of that virus. Our approach, HAVEN, is the first to apply to multiple hosts and to generalize to unseen hosts and viruses. HAVEN is a transformer-based architecture coupled with hierarchical self-attention that can accept sequences of highly diverse lengths. We integrate HAVEN with a prototype-based few-shot learning classifier to predict rare classes. We demonstrate the accuracy, robustness, and generalizability of HAVEN through a comprehensive series of experiments. In particular, we show that HAVEN can achieve a median AUPRC of 0.67 while predicting common hosts. Moreover, HAVEN retains this AUPRC value even for rare hosts (median prevalence as low as 0.09%). Our model performs on par with state-of-the-art foundation models, which are 65 to 5, 000 times larger in size, and outperforms them in identifying hosts of SARS-CoV-2 variants of concern.

Version published to 10.1101/2025.06.09.658367v1 on bioRxiv
Jun 13, 2025

Predicting the Evolutionary and Functional Landscapes of Viruses with a Unified Nucleotide-Protein Language Model: LucaVirus

This article has 16 authors:
1. Yuan-Fei Pan
2. Yong He
3. Yu-Qi Liu
4. Yong-Tao Shan
5. Shu-Ning Liu
6. Xue Liu
7. Xiaoyun Pan
8. Yinqi Bai
9. Zan Xu
10. Zheng Wang
11. Jieping Ye
12. Edward C. Holmes
13. Bo Li
14. Yao-Qing Chen
15. Zhao-Rong Li
16. Mang Shi
This article has no evaluationsLatest version Jun 20, 2025
The Viral Chase: Outsmarting Evolution with Data Trees and AI Predictions

This article has 1 author:
1. Robert Friedman
This article has no evaluationsLatest version Jun 5, 2025
Enhancing Strain-level Phage-Host Prediction through Experimentally Validated Negatives and Feature Optimization Strategies

This article has 6 authors:
1. Min Li
2. Gufeng Liu
3. Wenchen Song
4. Jianqiang Li
5. Lijia Ma
6. Minfeng Xiao
This article has no evaluationsLatest version Jun 3, 2025

Listed in

Abstract

Article activity feed

Related articles

Predicting the Evolutionary and Functional Landscapes of Viruses with a Unified Nucleotide-Protein Language Model: LucaVirus

The Viral Chase: Outsmarting Evolution with Data Trees and AI Predictions

Enhancing Strain-level Phage-Host Prediction through Experimentally Validated Negatives and Feature Optimization Strategies