Optimized BERT-based NLP outperforms Zero-Shot Methods for Automated Symptom Detection in Clinical Practice

Juan G. Diaz Ochoa
Natalie Layer
Jonas Mahr
Faizan E Mustafa
Christian U. Menzel
Martina Müller
Tobias Schilling
Gerald Illerhaus
Markus Knott
Alexander Krohn

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

B ackground

Large Language Nodels (LLMs) have raised broad expectations for clinical use, particularly in the processing of complex medical narratives. However, in practice, more targeted Natural Language Processing (NLP) approaches may offer higher precision and feasibility for symptom extraction from real-world clinical texts. NLP provides promising tools for extracting clinical information from unstructured medical narratives. However, few studies have focused on integrating symptom information from free texts in German, particularly for complex patient groups such as emergency department (ED) patients. The ED setting presents specific challenges: high documentation pressure, heterogeneous language styles, and the need for secure, locally deployable models due to strict data protection regulations. Furthermore, German remains a low-resource language in clinical NLP.

M ethods

We implemented and compared two models for zero-shot learning—GLiNER and Mistral—and a fine-tuned BERT-based SCAI-BIO/BioGottBERT model for named entity recognition (NER) of symptoms, anatomical terms, and negations in German ED anamnesis texts in an on-premises environment in a hospital. Manual annotations of 150 narratives were used for model validation. The postprocessing steps included confidence-based filtering, negation exclusion, symptom standardization, and integration with structured oncology registry data. All computations were performed on local hospital servers in an on-premises implementation to ensure full data protection compliance.

R esults

The fine-tuned SCAI-BIO/BioGottBERT model outperformed both zero-shot approaches, achieving an F1 score of 0.84 for symptom extraction and demonstrating superior performance in negation detection. The validated pipeline enabled systematic extraction of affirmed symptoms from ED-free text, transforming them into structured data. This method allows large-scale analysis of symptom profiles across patient populations and serves as a technical foundation for symptom-based clustering and subgroup analysis.

C onclusions

Our study demonstrates that modern NLP methods can reliably extract clinical symptoms from German ED free text, even under strict data protection constraints and with limited training resources. Fine-tuned models offer a precise and practical solution for integrating unstructured narratives into clinical decision-making. This work lays the methodological foundation for a new way of systematically analyzing large patient cohorts on the basis of free-text data. Beyond symptoms, this approach can be extended to extracting diagnoses, procedures, or other clinically relevant entities. Building upon this framework, we apply network-based clustering methods (in a subsequent study) to identify clinically meaningful patient subgroups and explore sex- and age-specific patterns in symptom expression.

Version published to 10.1101/2025.04.21.25326037v1 on medRxiv
Apr 22, 2025

Large Language Models Pass the Korean Pharmacist Licensing Examination: A Benchmarking Study

This article has 2 authors:
1. David Hyunyoo Jang
2. Juyong Lee
This article has no evaluationsLatest version Apr 19, 2025
Large Language Models in Portuguese for Healthcare: A Systematic Review

This article has 7 authors:
1. Andre Massahiro Shimaoka
2. Antonio Carlos da Silva Junior
3. José Marcio Duarte
4. Thiago Bulhões da Silva Costa
5. Ivan Torres Pisa
6. Luciano Rodrigo Lopes
7. Paulo Bandiera-Paiva
This article has no evaluationsLatest version May 22, 2025
Scalable Identification of Clinically Relevant COPD Documents: A Lightweight NLP Model for Large-Scale EHR Datasets

This article has 11 authors:
1. Mohammed Al-Garadi
2. Sharon E. Davis
3. Michael E. Matheny
4. Dax Westerman
5. Adrienne K. Conger
6. Bradley W. Richmond
7. Thomas A. Lasko
8. Iben M. Ricket
9. Laura M. Paulin
10. Jeremiah R. Brown
11. Ruth M. Reeves
This article has no evaluationsLatest version Apr 25, 2025

Listed in

Abstract

B ackground

M ethods

R esults

C onclusions

Article activity feed

Related articles

Large Language Models Pass the Korean Pharmacist Licensing Examination: A Benchmarking Study

Large Language Models in Portuguese for Healthcare: A Systematic Review

Scalable Identification of Clinically Relevant COPD Documents: A Lightweight NLP Model for Large-Scale EHR Datasets