PanSpace: Fast and Scalable Indexing for Massive Bacterial Databases

Jorge Avila Cartes
Simone Ciccolella
Luca Denti
Raghuram Dandinasivara
Gianluca Della Vedova
Paola Bonizzoni
Alexander Schönhuth

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Species identification is a critical task in agriculture, food processing, and health-care. The rapid growth of genomic databases — driven in part by the increasing investigation of bacterial genomes in clinical microbiology — has outpaced the capabilities of conventional tools such as BLAST for basic search and query tasks. A key bottleneck in microbiome studies lies in building indexes that allow rapid species identification and classification from assemblies while scaling efficiently to massive resources such as the AllTheBacteria database, thus enabling large-scale analyses to be performed even on a common laptop.

Results

We introduce PanSpace , the first convolutional neural network–based approach that leverages dense vector (embedding) indexing —– scalable to billions of embeddings —– for indexing and querying massive bacterial genome databases. PanSpace is specifically designed to classify bacterial draft assemblies. Compared to the most recent and competitive tool for this task, PanSpace requires only ~2 GB of disk space to index the AllTheBacteria database, an 8 × reduction relative to existing methods. Moreover, it delivers ultra-fast query performance, processing more than 1,000 assemblies in less than two and a half minutes, while preserving the utmost accuracy of state-of-the-art approaches.

Availability

PanSpace is available at https://github.com/pg-space/panspace .

Version published to 10.1101/2025.03.19.644115 on bioRxiv
Mar 19, 2025

MiCoReCa (Microbiome Community Resource Catalogue) - Towards Centralized Curation And Integration Of Microbiome Bioinformatics Resources

This article has 8 authors:
1. Vivek Ashokan
2. Clara Emery
3. Agnès Barnabé
4. Valentin Loux
5. Christina Pavloudi
6. Paul Zierep
7. Nikolaos Strepis
8. Bérénice Batut
This article has no evaluationsLatest version Jan 6, 2026
MiCoReCa (Microbiome Community Resource Catalogue) - Towards Centralized Curation And Integration Of Microbiome Bioinformatics Resources

This article has 8 authors:
1. Vivek Ashokan
2. Clara Emery
3. Agnès Barnabé
4. Valentin Loux
5. Christina Pavloudi
6. Paul Zierep
7. Nikolaos Strepis
8. Bérénice Batut
This article has no evaluationsLatest version Jan 6, 2026
AutoFilter: A Low-Cost Biocomputational Framework for High-Throughput Screening of Chemical Databases and Identification of Novel Malaria Inhibitors Targeting Plasmodium Falciparum

This article has 2 authors:
1. Kavin Ramadoss
2. Kamlendra Singh
This article has no evaluationsLatest version Jan 3, 2026

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability

Article activity feed

Related articles

MiCoReCa (Microbiome Community Resource Catalogue) - Towards Centralized Curation And Integration Of Microbiome Bioinformatics Resources

MiCoReCa (Microbiome Community Resource Catalogue) - Towards Centralized Curation And Integration Of Microbiome Bioinformatics Resources

AutoFilter: A Low-Cost Biocomputational Framework for High-Throughput Screening of Chemical Databases and Identification of Novel Malaria Inhibitors Targeting Plasmodium Falciparum