Predicting host-pathogen interactions using a proteome-scale language model

Cyril Malbranke
Cecilia Fruet
Anne-Florence Bitbol

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

ProteomeLM (Malbranke et al., 2025) is a proteome-scale language model trained on proteomes spanning the tree of life to reconstruct masked protein embeddings from proteome context within each species. Its attention coefficients capture protein-protein interactions without supervision. Here, we show that this capability extends to cross-species host-pathogen interactions (HPI) across ten human pathogen taxa spanning viruses and bacteria, and can be further improved with lightweight fine-tuning. We introduce ProteomeLM-HPI , a parameter-efficient adaptation via LoRA, trained on concatenated host-pathogen proteomes to reconstruct masked pathogen embeddings from host context. ProteomeLM-HPI involves two key design choices: asymmetric masking (pathogen-heavy masking) and blocked self-attention . Systematic ablations show that both choices contribute. To assess generalization, we introduce a strict cross-species benchmark enforcing pathogen-level hold-out and 40% sequence-identity filtering. On this benchmark, Proteome-HPI improves AUC on 9 out of 10 unseen pathogens.

Version published to 10.64898/2026.05.29.728699 on bioRxiv
May 31, 2026

Susagi: A Microbiome World Model

This article has 3 authors:
1. Matteo Peluso
2. Janko Tackmann
3. Christian von Mering
This article has no evaluationsLatest version May 11, 2026
SPIN: A Scalable Bioinformatics Pipeline for Screening Pathogenicity Related Host-Pathogen Protein INteractions Using AlphaFold3

This article has 10 authors:
1. Zhenghong Bao
2. Harsh Khanna
3. Bhuvan Dhand
4. Wardatou Boukari
5. Trishna Tiwari
6. Mousami Poudel
7. Carlos D Messina
8. Mukesh Jain
9. Jose Carlos Huguet-Tapia
10. Rosemary Loria
This article has no evaluationsLatest version Apr 23, 2026
Cross-Attention Over RNA And Protein Sequences Enables Generalizable Interaction Prediction

This article has 7 authors:
1. Mario Catalano
2. Gerardo Pepe
3. Gabriele Ausiello
4. Claire McWhite
5. Giorgio Gambosi
6. Manuela Helmer Citterich
7. Pier Federico Gherardini
This article has no evaluationsLatest version Apr 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Susagi: A Microbiome World Model

SPIN: A Scalable Bioinformatics Pipeline for Screening Pathogenicity Related Host-Pathogen Protein INteractions Using AlphaFold3

Cross-Attention Over RNA And Protein Sequences Enables Generalizable Interaction Prediction