Ambient AI Documentation in Mixed-Language Encounters: A Heuristic Evaluation of Spanish–English and Mandarin–English Conversations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ambient AI documentation systems rely on automatic speech recognition to transcribe patient–provider conversations before generating clinical notes. However, little empirical evidence exists on how these systems perform in mixed-language clinical encounters. We conducted a mixed-method heuristic evaluation of an ambient AI documentation tool using 24 reenacted primary care conversations involving Spanish–English and Mandarin–English code-switching. Quantitative analyses measured mixed error rate (MER) and code-switching detection. Overall MER was low, with a median of 4% and less variation in Spanish–English conversations, and 9% in Mandarin–English conversations, but with outliers reaching 67%. The system generally detected language switches reliably, although deletions occurred frequently in Mandarin–English transcripts at switch points. Qualitative analysis revealed transcription errors related to phonetic similarity, automatic language translation, clinical terminology recognition, and language-specific challenges. These findings highlight considerations for improving ambient AI clinical documentation systems to support multilingual providers in delivering care for linguistically diverse populations.