Automated Antibiogram Extraction from Unstructured Microbiology Reports: A Comparative Performance and Efficiency Analysis of Domain-Specific Named Entity Recognition (NER) Pipelines

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Antibiograms are essential tools in antimicrobial stewardship programs (ASPs), guiding empirical antibiotic therapy and tracking antimicrobial resistance (AMR) trends. However, the manual compilation of antibiograms from unstructured microbiology reports is labor-intensive and prone to delays. Here, we present a comparative study of three Natural Language Processing (NLP) approaches for automating data extraction from free-text reports: a rule-based Named Entity Recognition (NER) system, a statistical NER model using the spaCy library, and a transformer-based question-answering (QA) model leveraging DistilBERT. We generated a synthetic dataset of 3,000 microbiology reports to evaluate these methods, focusing on extraction accuracy (precision, recall, F1-score) and computational efficiency. The rule-based NER achieved perfect accuracy (F1 = 1.00) with minimal computational resources, making it highly suitable for real-time deployment. The spaCy model, after domain-specific fine-tuning, demonstrated strong performance (F1 = 1.00), effectively handling linguistic variations. In contrast, the transformer QA model showed moderate accuracy, excelling at extracting organism names but underperforming in detecting contamination status due to contextual ambiguities (F1 = 0.68-0.8). Computational efficiency analysis revealed that the rule-based and spaCy NER models could process reports rapidly with limited resources, while the transformer QA model required substantial computational power, potentially limiting its clinical utility. Additionally, we developed a prototype Expert System using R Shiny employing the rule-based NER to integrate extracted data into a real-time antibiogram dashboard, demonstrating the feasibility of these approaches in practical settings. The STEWEX (Stewardship Expert System) prototype has the capabilities of real-time building of fully functional antibiogram from simulated unstructured reports and simulated antibiotic susceptibility results. In conclusion, our results suggest that while advanced NLP methods offer flexibility, rule-based NER systems provide unparalleled accuracy and efficiency for data extraction from unstructured reports in ASPs, which represents bottle neck in development of antibiogram. Future efforts will focus on validating these approaches using real-world clinical data with the ultimate goal of fully automating antibiogram generation to support data-driven antimicrobial stewardship.

Article activity feed