MARISMa: a routine MALDI-TOF MS dataset from 2018 to 2024 from Spain
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clinical microbiology laboratories play a crucial role in identifying pathogens, guiding antibiotic treatment, and managing antimicrobial resistance (AMR). Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS) has become essential for rapid, accurate, and cost-effective microbial identification. Recent advances in integrating MALDI-TOF MS with Artificial Intelligence (AI) show promise in improving microbial detection and prediction of AMR. However, progress is limited by the lack of comprehensive and openly accessible datasets that restrict the validation, reproducibility, and applicability of the model.
To address this gap, we introduce a publicly available MALDI-TOF MS dataset comprising 202,700 unique spectra from isolates collected between 2018 and 2024 at the Hospital General Universitario Gregorio Marañón, Spain. This dataset includes 186,213 bacteria, 16,163 fungal, and 371 mycobacterial samples, of which 29,679 contain AMR annotations. This resource is openly and freely shared, rigorously curated, and designed to support a wide range of machine learning. By ensuring unrestricted access to high-quality, standardized data, this dataset aims to promote transparency, reproducibility, comparative benchmarking, and collaborative progress in AI-driven clinical microbiology.