Comparison of Large Language Models Versus Traditional Information Extraction Methods for Real World Evidence of Patient Symptomatology in Acute and Post-Acute Sequelae of SARS-CoV-2
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Patient symptoms play a critical role in disease progression and diagnosis, yet they are most often captured in unstructured clinical notes. This study explores use of large language models (LLMs) for extracting patient symptoms from clinical notes, comparing their performance against rule-based information extraction (IE) systems, like BioMedICUS. By fine-tuning an LLM on diverse corpora from multiple healthcare institutions, we aimed to improve symptom extraction accuracy and efficiency with symptoms related to acute and post-acute sequelae of SARS-CoV-2. We also conducted prevalence analysis to highlight significant differences in symptom prevalence across corpora and performed fairness analysis to assess the model's equity across race and gender. Our findings indicate that while LLMs can match the effectiveness of rule-based IE methods, they face significant challenges related to demographic bias and generalizability due to variability in training corpora. This evaluation revealed overfitting and insufficient generalization, especially when models were trained predominantly on limited datasets with single annotator bias. This study also revealed that LLMs offer substantial advantages in efficiency, adaptability, and scalability for biomedical IE, marking a transformative shift in clinical data extraction processes. These results provide real-world evidence of the necessity for diverse, high-quality training datasets and robust annotation processes to enhance LLM’s performance and reliability in clinical applications.