An LLM-Based Comparison of Ambient AI Scribes for Clinical Documentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ambient AI scribes have become an increasingly promising option for automating clinical documentation, with dozens of enterprise solutions available. It remains uncertain whether models with domain-specific tuning outperform naïve models “out of the box.” This study evaluated five commercial AI scribes, alongside a custom solution using the base model of GPT-o1 without fine-tuning, as well as an experienced human scribe, in a series of simulated clinical encounters. Generated notes from these parties were scored by large language models (LLMs) using a rubric assessing completeness, organization, accuracy, complexity handling, conciseness, and adaptability. Our naive solution achieved scores comparable with industry-leading solutions across all rubric dimensions. These findings suggest that the added value of domain-specific training in ambient AI medical scribes may be limited when compared to base foundation models.