Conversational Speech for Respiratory Triage in Primary Care: A Pilot Study

Vijay Ravi
Camille Noufi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Respiratory complaints account for a substantial share of adult ambulatory care visits, and triaging them accurately has direct consequences for antibiotic stewardship and pathogen-specific therapy. Prior work has investigated voice as a triage signal, but that literature is dominated by single-condition detection from scripted speech in crowdsourced or controlled clinical settings and has not been evaluated at primary care scale on conversational ambient audio.

Methods

A dataset of 514,377 ambient-recorded primary care visits from 379,225 adult patients at a US clinic network was used, with per-visit clinically assigned ICD-10 diagnosis codes and de-identified demographic and geographic metadata. Patient audio was extracted from each doctor-patient conversation, and spectral, voice quality, and prosodic features were computed. Eleven binary classification tasks were defined, aligned with a respiratory triage cascade (e.g., acute respiratory versus acute non-respiratory illness, and lower versus upper respiratory tract infection). An acoustic model (feed-forward network) was trained independently for each task using patient-stratified five-fold cross-validation and evaluated on a held-out test set. Each task’s model was also compared against six non-acoustic baselines using a single demographic, geographic, or temporal variable. The 11 trained classifiers were composed into a hierarchical cascade and illustrated as case studies on selected patients.

Results

Test-set AUC across the 11 tasks ranged from 0.602 (95% CI: 0.588–0.614) to 0.745 (95% CI: 0.742–0.748), with a mean expected calibration error of 0.018. Six of eleven binaries outperformed all confounder baselines. Four binaries showed median within-stratum AUC of 0.62–0.70 when the confounder was held fixed, indicating acoustic discrimination beyond what the confounder alone explains. The exception was the pneumonia versus non-pneumonia lower respiratory tract infection binary, which failed against the patient-city confounder baseline, plausibly reflecting a clinic-level difference in ICD-10 coding.

Conclusion

Conversational primary care audio carries acoustic signal that discriminates clinically meaningful respiratory contrasts. Absolute performance is moderate, but the conditions are stricter than prior work: conversational speech and differential-diagnosis contrasts among sick patients. This pilot study is a baseline for voice-based clinical AI moving beyond sick-versus-healthy detection toward differential-diagnosis panels and a proof-of-concept for hierarchical reasoning.

Version published to 10.64898/2026.06.09.26355284 on medRxiv
Jun 11, 2026

Cross-Model Variability in Large Language Model Triage Behavior for Potential Stroke Symptoms

This article has 4 authors:
1. Daniel A Dworkis
2. Jon Stenstrom
3. Ayan Sen
4. Richard T Lucarelli
This article has no evaluationsLatest version May 25, 2026
Towards A Foundation Model for Clinical Voice Biomarkers

This article has 8 authors:
1. Olivier Elemento
2. Alexandros Sigaras
3. Joseph T. Colonel
4. Iman Hajirasouliha
5. Satrajit S. Ghosh
6. Yael Bensoussan
7. Bridge2AI-Voice Consortium
8. Anaïs Rameau
This article has no evaluationsLatest version May 30, 2026
Does Recording Hardware Matter for Clinical Speech Recognition? Evaluating ASR Performance Across Consumer Devices

This article has 9 authors:
1. Brian D. Tran
2. Di Hu
3. Seungjun Kim
4. Yawen Guo
5. Ramya Mangu
6. Tera L Reynolds
7. Jennifer Elston Lafata
8. Ming Tai-Seale
9. Kai Zheng
This article has no evaluationsLatest version May 22, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Article activity feed

Related articles

Cross-Model Variability in Large Language Model Triage Behavior for Potential Stroke Symptoms

Towards A Foundation Model for Clinical Voice Biomarkers

Does Recording Hardware Matter for Clinical Speech Recognition? Evaluating ASR Performance Across Consumer Devices