There’s not an APP for That: Comparing Interpretation Through an AI Voice App and Qualified Medical Interpreters in Real World Clinical Settings

Iris Feinberg
Heewon Lee-Laminack
Elizabeth L. Tighe
Ifedola Owoeye

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background/Objectives: AI voice interpretation applications are increasingly used in clinical settings to address language access challenges, yet evidence comparing their performance to qualified in-person medical interpreters in authentic clinical encounters remains limited. This study compares the linguistic and clinical accuracy of AI-based voice interpretation and certified in-person medical interpretation using recorded real-world clinical encounters. Methods: Outpatient clinical encounters involving patients with limited English proficiency were audio-recorded. Fourteen physician speech segments (mean length 78.5 words) representing common diagnostic, treatment, and counseling content were extracted and translated into seven languages using both an AI voice interpretation application and certified in-person medical interpreters. Bilingual reviewers and a clinician evaluated translations for accuracy, completeness, and clinical fidelity. Qualitative analyses examined error patterns and contextual loss; quantitative comparisons assessed error rate differences across languages and interpretation conditions. Results: AI voice app translations exhibited significantly higher linguistic errors (χ2[1] = 19.78, p < .001) and clinical accuracy errors (χ2[1] = 45.07, p < .001) than qualified medical interpreter translations. Clinical error rates were 33.3% for AI-generated versus 4.8% for interpreter-generated translations. Error rates were also higher for less commonly spoken languages compared to commonly spoken languages when using the AI voice app (42.9% vs. 14.3%). Conclusions: Qualified in-person interpreters remain essential for safe, accurate clinical communication. Hybrid models integrating professional interpretation with appropriately deployed AI technology may offer a balanced approach to expanding language access while maintaining communication safety and equity.

Version published to 10.20944/preprints202603.2013.v1
Mar 25, 2026

Large Language Models as Ophthalmic Patient Educators: A Comparative Evaluation of Readability, Understandability, and Actionability

This article has 3 authors:
1. Shivam Chandra
2. Vineet Kumar
3. Patrianakos Thomas
This article has no evaluationsLatest version Mar 20, 2026
Development and Prospective Validation of CPX-MATE: An End-to-End Medical Education Platform Integrating Voice-Based Virtual Patient Simulation and Automated Real-time Evaluation

This article has 12 authors:
1. Ji Woo Song
2. Minseong Kim
3. Chanhee Hong
4. Young Sam Kim
5. Junho Cho
6. Ji Hoon Kim
7. Jinwoo Myung
8. Arom Choi
9. Hanna Yoon
10. Stephen Gyung Won Lee
11. Seng Chan You
12. Chaeryoung Park
This article has no evaluationsLatest version Mar 8, 2026
Large Language Models for Automated Icd-10 Coding of Obstetric Clinical Notes in Portuguese: Comparison With Human Coders

This article has 6 authors:
1. Ricardo da Silva Santos
2. Murilo Gleyson Gazzola
3. Paulo Marcelino Figueira
4. Adriana Gomes Luz
5. Rodolfo de Carvalho Pacagnella
6. Cristiano Torezzan
This article has no evaluationsLatest version Feb 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language Models as Ophthalmic Patient Educators: A Comparative Evaluation of Readability, Understandability, and Actionability

Development and Prospective Validation of CPX-MATE: An End-to-End Medical Education Platform Integrating Voice-Based Virtual Patient Simulation and Automated Real-time Evaluation

Large Language Models for Automated Icd-10 Coding of Obstetric Clinical Notes in Portuguese: Comparison With Human Coders