Susagi: A Microbiome World Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Accurately modelling how microbial communities assemble and change across hosts and environments is essential for analysis and intervention. Typical pipelines capture limited generalisable structure and often depend on fixed ecological unit definitions.

Results

We present Susagi (Set Unsupervised Assessment of Genetic Imposters), a permutation-invariant denoising transformer that operates directly on sets of bacterial SSU rRNA gene embeddings to learn a member-level stability function.

The model was trained on ∼ 2 × 10 6 bacterial community samples. We show that it reliably predicts community composition dynamics in a zero-shot (no training) setting, demonstrated here across three challenging microbiomes for which traditional ML methods do not exceed random expectation. The model’s stability scores capture biological structure: across datasets, higher scores are enriched for agricultural, cropland, and soil-associated habitats, consistent with microbial communities in these environments supporting positive diversity–stability relationships. The highest stability scores are only attained by communities with high Pielou evenness and large size, despite the fact that the model has never seen abundances suggesting it can recognise community dysbiosis from presence absence alone. Furthermore, they also track biological gradients such as subject age.

Susagi is competitive with another Large Microbiome Model (Microbial General Model, MGM) on diverse classification tasks, without task-specific fine-tuning and with an increased parameter efficiency.

Ultimately, our model will facilitate hypothesis generation for complex microbial processes, including deterministic assembly and microbial interactions, crucial for instance in the design of communities in silico .

Availability and implementation

Evaluation code and model weights can be found from https://github.com/the-puzzler/Microbiome-Modelling . Model weights can also be downloaded directly from https://huggingface.co/basilboy/microbiome-model . Interactive demo can be found here: https://huggingface.co/spaces/basilboy/microbiome-space .

Article activity feed