Harnessing Natural Language Processing for Automated Exposure Therapy Coding in Youth with OCD
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective: To develop and evaluate an automated classification system for labeling Exposure Process Coding System quality codes -- specifically exposure and encourage events -- during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.Methods: The system was trained and tested on 360 manually labeled pediatric OCD therapy sessions from three clinical trials. Audio data were processed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Manual transcriptions of two-minute audio segments were compared against ASR-generated transcripts to assess transcription accuracy via word error rate (WER). The resulting text was analyzed with transformer-based models, including BERT, SBERT, and Meta Llama 3. Two classification settings were explored: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.Results: Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs. 0.51). In the sequence classification setting, Llama 3 models achieved high performance with AUC scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.Conclusion: Automated quality coding of in-person exposure therapy sessions is feasible using current ASR and transformer-based models. These findings suggest potential for real-time quality assessment in clinical practice and scalable research on effective therapy methods. Finally, future work is needed for optimization, including improvements in ASR accuracy, expanded training datasets, and multimodal data integration.