An LLM-Based Comparison of Ambient AI Scribes for Clinical Documentation

Jaison Jain
James Kaan
Suraj Jain
Austin Young
Camilo Martinez
William Kartsonis
Carlos Ortiz
Ryan Cheng
Erik Jaklitsch
Srinivas Cherukuri
Aleksandra Qilleri
Apostolos Tassiopoulos

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Ambient AI scribes have become an increasingly promising option for automating clinical documentation, with dozens of enterprise solutions available. It remains uncertain whether models with domain-specific tuning outperform naïve models “out of the box.” This study evaluated five commercial AI scribes, alongside a custom solution using the base model of GPT-o1 without fine-tuning, as well as an experienced human scribe, in a series of simulated clinical encounters. Generated notes from these parties were scored by large language models (LLMs) using a rubric assessing completeness, organization, accuracy, complexity handling, conciseness, and adaptability. Our naive solution achieved scores comparable with industry-leading solutions across all rubric dimensions. These findings suggest that the added value of domain-specific training in ambient AI medical scribes may be limited when compared to base foundation models.

Version published to 10.1101/2025.06.24.25330085 on medRxiv
Jun 26, 2025

Calibrating CONSORT-AI with FAIR Principles to enhance reproducibility in AI-driven clinical trials

This article has 10 authors:
1. Kirubel Biruk Shiferaw
2. Irina Balaur
3. Gary Collins
4. Curtis Sharma
5. Leyla Jael Castro
6. Fotis Psomopoulos
7. Daniel Garijo
8. Ron Henkel
9. Dagmar Waltemath
10. Atinkut Alamirrew Zeleke
This article has no evaluationsLatest version Jul 7, 2025
A Unified Platform for Radiology Report Generation and Clinician-Centered AI Evaluation

This article has 9 authors:
1. Zhuoqi Ma
2. Xinye Yang
3. Zach Atalay
4. Andrew Yang
5. Scott Collins
6. Harrison Bai
7. Michael Bernstein
8. Grayson Baird
9. Zhicheng Jiao
This article has no evaluationsLatest version Jul 8, 2025
Design and Implementation of an End-to-End AI-Driven Colonoscopy Recall Workflow at Scale

This article has 16 authors:
1. Aman Mohapatra
2. Rachel Porth
3. Si Wong
4. Heather Hardy
5. Gail Piatkowski
6. John Shang
7. Maelys Amat
8. Sarah Flier
9. Adam Salsman
10. Ted Fitzgerald
11. Ayad Shammout
12. David Rubins
13. Amy Miller
14. Venkat Jegadeesan
15. Arvind Ravi
16. Joseph Feuerstein
This article has no evaluationsLatest version Jul 14, 2025

Listed in

Abstract

Article activity feed

Related articles

Calibrating CONSORT-AI with FAIR Principles to enhance reproducibility in AI-driven clinical trials

A Unified Platform for Radiology Report Generation and Clinician-Centered AI Evaluation

Design and Implementation of an End-to-End AI-Driven Colonoscopy Recall Workflow at Scale