Harnessing Natural Language Processing for Automated Exposure Therapy Coding in Youth with OCD

Juan Antonio Lossio-Ventura
Samuel Frank
Grace Ringlein
Kirsten Bonson
Ardyn Olszko
Abbey Knobel
Daniel S. Pine
Jennifer Freeman
Kristen Benito
David C. Jangraw
Francisco Pereira

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective: To develop and evaluate an automated classification system for labeling Exposure Process Coding System quality codes -- specifically exposure and encourage events -- during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.Methods: The system was trained and tested on 360 manually labeled pediatric OCD therapy sessions from three clinical trials. Audio data were processed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Manual transcriptions of two-minute audio segments were compared against ASR-generated transcripts to assess transcription accuracy via word error rate (WER). The resulting text was analyzed with transformer-based models, including BERT, SBERT, and Meta Llama 3. Two classification settings were explored: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.Results: Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs. 0.51). In the sequence classification setting, Llama 3 models achieved high performance with AUC scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.Conclusion: Automated quality coding of in-person exposure therapy sessions is feasible using current ASR and transformer-based models. These findings suggest potential for real-time quality assessment in clinical practice and scalable research on effective therapy methods. Finally, future work is needed for optimization, including improvements in ASR accuracy, expanded training datasets, and multimodal data integration.

Version published to 10.31219/osf.io/p2cmq_v2 on OSF Preprints
Jul 1, 2025
Version published to 10.31219/osf.io/p2cmq_v1 on OSF Preprints
Apr 24, 2025

Classification errors distort findings in automated speech processing: examples and solutions from child-development research

This article has 5 authors:
1. Lucas Gautheron
2. Evan Kidd
3. Anton Malko
4. Marvin Lavechin
5. Alejandrina Cristia
This article has no evaluationsLatest version Jul 3, 2025
Automated speech content analysis to detect depression with large language models: towards multilingual and few-shot capabilities

This article has 7 authors:
1. Rachid Riad
2. Alexandre Ducorroy
3. Sélim Benjamin GUESSOUM
4. Filomène ROQUEFORT
5. Adrien Lesage
6. Xuan-Nga Cao
7. Alexis Bourla
This article has no evaluationsLatest version May 13, 2025
clickBrick Prompt Engineering: Optimizing Large Language Model Performance in Clinical Psychiatry

This article has 10 authors:
1. F Gerrik Verhees
2. Fabian Huth
3. Vincent Meyer
4. Fabian Wolf
5. Michael Bauer
6. Andrea Pfennig
7. Philipp Ritter
8. Jakob N Kather
9. Isabella C Wiest
10. Pavol Mikolas
This article has no evaluationsLatest version Jun 30, 2025

Listed in

Abstract

Article activity feed

Related articles

Classification errors distort findings in automated speech processing: examples and solutions from child-development research

Automated speech content analysis to detect depression with large language models: towards multilingual and few-shot capabilities

clickBrick Prompt Engineering: Optimizing Large Language Model Performance in Clinical Psychiatry