Automated Antibiogram Extraction from Unstructured Microbiology Reports: A Comparative Performance and Efficiency Analysis of Domain-Specific Named Entity Recognition (NER) Pipelines

Mohamed Kamal
Omneya Hassanain

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Antibiograms are essential tools in antimicrobial stewardship programs (ASPs), guiding empirical antibiotic therapy and tracking antimicrobial resistance (AMR) trends. However, the manual compilation of antibiograms from unstructured microbiology reports is labor-intensive and prone to delays. Here, we present a comparative study of three Natural Language Processing (NLP) approaches for automating data extraction from free-text reports: a rule-based Named Entity Recognition (NER) system, a statistical NER model using the spaCy library, and a transformer-based question-answering (QA) model leveraging DistilBERT. We generated a synthetic dataset of 3,000 microbiology reports to evaluate these methods, focusing on extraction accuracy (precision, recall, F1-score) and computational efficiency. The rule-based NER achieved perfect accuracy (F1 = 1.00) with minimal computational resources, making it highly suitable for real-time deployment. The spaCy model, after domain-specific fine-tuning, demonstrated strong performance (F1 = 1.00), effectively handling linguistic variations. In contrast, the transformer QA model showed moderate accuracy, excelling at extracting organism names but underperforming in detecting contamination status due to contextual ambiguities (F1 = 0.68-0.8). Computational efficiency analysis revealed that the rule-based and spaCy NER models could process reports rapidly with limited resources, while the transformer QA model required substantial computational power, potentially limiting its clinical utility. Additionally, we developed a prototype Expert System using R Shiny employing the rule-based NER to integrate extracted data into a real-time antibiogram dashboard, demonstrating the feasibility of these approaches in practical settings. The STEWEX (Stewardship Expert System) prototype has the capabilities of real-time building of fully functional antibiogram from simulated unstructured reports and simulated antibiotic susceptibility results. In conclusion, our results suggest that while advanced NLP methods offer flexibility, rule-based NER systems provide unparalleled accuracy and efficiency for data extraction from unstructured reports in ASPs, which represents bottle neck in development of antibiogram. Future efforts will focus on validating these approaches using real-world clinical data with the ultimate goal of fully automating antibiogram generation to support data-driven antimicrobial stewardship.

Version published to 10.1101/2025.03.10.25323640v1 on medRxiv
Mar 11, 2025

Active learning pipeline to automatically identify candidate terms for a CDSS ontology—measures, experiments, and performance

This article has 17 authors:
1. Shailesh Alluri
2. Keerthana Komatineni
3. Rohan Goli
4. Nina Hubig
5. Hua Min
6. Yang Gong
7. Dean F. Sittig
8. David Robinson
9. Paul Biondich
10. Adam Wright
11. Christian Nøhr
12. Timothy Law
13. Arild Faxvaag
14. Richard D. Boyce
15. Ronald Gimbel
16. Lior Rennert
17. Xia Jing
This article has no evaluationsLatest version Apr 17, 2025
ARGContextProfiler: Extracting and Scoring the Genomic Contexts of Antibiotic Resistance Genes using Assembly Graphs

This article has 5 authors:
1. Nazifa Ahmed Moumi
2. Shafayat Ahmed
3. Connor Brown
4. Amy Pruden
5. Liqing Zhang
This article has no evaluationsLatest version Mar 28, 2025
AOP-helpFinder 3.0: from text mining to network visualization of key event relationships, and knowledge integration from multi-sources

This article has 5 authors:
1. Thomas Jaylet
2. Florence Jornod
3. Quentin Capdet
4. Olivier Armant
5. Karine Audouze
This article has no evaluationsLatest version Apr 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Active learning pipeline to automatically identify candidate terms for a CDSS ontology—measures, experiments, and performance

ARGContextProfiler: Extracting and Scoring the Genomic Contexts of Antibiotic Resistance Genes using Assembly Graphs

AOP-helpFinder 3.0: from text mining to network visualization of key event relationships, and knowledge integration from multi-sources