Ambient AI Documentation in Mixed-Language Encounters: A Heuristic Evaluation of Reenacted Mandarin–English and Spanish–English Clinical Conversations

Di Hu
Daniel Flores
Lidia Flores
Ruby Chien
Kyle Lam
Emilie Chow
Yawen Guo
Steven Tam
Danielle Perret
Deepti Pandita
Kai Zheng

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Ambient AI documentation systems rely on automatic speech recognition to transcribe patient–provider conversations before generating clinical notes. However, little evidence exists on how these systems perform in mixed-language clinical encounters. We conducted a mixed-methods heuristic evaluation of an ambient AI documentation tool using 24 reenacted primary care conversations, including 12 Mandarin–English conversations developed from real-world encounter excerpts and 12 Spanish–English adapted counterparts. Quantitative analyses measured mixed error rate (MER) and code-switching. Overall MER was low, with a median of 4% and less variation in Spanish–English conversations, and 9% in Mandarin–English conversations, but with outliers reaching 67%. The system generally detected language switches reliably, although deletions occurred frequently in Mandarin–English transcripts at switch points. Qualitative analysis revealed transcription errors related to phonetic similarity, automatic translation, clinical terminology recognition, and language-specific challenges. These findings highlight considerations for improving ambient AI tools to support multilingual providers in delivering care for linguistically diverse populations.

Version published to 10.64898/2026.05.19.26353603 on medRxiv
May 22, 2026

Does Recording Hardware Matter for Clinical Speech Recognition? Evaluating ASR Performance Across Consumer Devices

This article has 9 authors:
1. Brian D. Tran
2. Di Hu
3. Seungjun Kim
4. Yawen Guo
5. Ramya Mangu
6. Tera L Reynolds
7. Jennifer Elston Lafata
8. Ming Tai-Seale
9. Kai Zheng
This article has no evaluationsLatest version May 22, 2026
Language-dependent diagnostic safety of medical AI systems: a cross-lingual benchmarking and prospective clinical study

This article has 16 authors:
1. Yuqian Wang
2. Hongyu He
3. Rongpeng Zhu
4. Yunyi Lu
5. Pawit Phadungsaksawasdi
6. Manqiang Peng
7. Zengping Liu
8. Ke Zou
9. Ye Zhang
10. Sien Ping Chew
11. Yih Chung Tham
12. Arian Khorasani
13. Hao Deng
14. Ching-Yu Cheng
15. Jie Yang
16. Dianbo Liu
This article has no evaluationsLatest version May 21, 2026
Automated Interpretation of EEG Reports Using a Large Language Model with Structured Confidence Outputs

This article has 12 authors:
1. Wanying Tian
2. Steven Bergner
3. Alexander Moiseev
4. Fred Popowich
5. George Medvedev
6. Mark P. Richardson
7. Roman Rodionov
8. Pengcheng Xi
9. Sam M. Doesburg
10. Urs Ribary
11. Joel S. Winston
12. Vasily A. Vakorin
This article has no evaluationsLatest version Jul 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Does Recording Hardware Matter for Clinical Speech Recognition? Evaluating ASR Performance Across Consumer Devices

Language-dependent diagnostic safety of medical AI systems: a cross-lingual benchmarking and prospective clinical study

Automated Interpretation of EEG Reports Using a Large Language Model with Structured Confidence Outputs