A quantitative comparison of AI and human therapists during simulated therapy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Artificial intelligence is increasingly integrated into mental health care, yet direct comparisons between human and AI therapists in extended, credible therapeutic interactions remain limited. This study examined a corpus of 233,691 words from 103 therapy sessions to test similarities and differences between human therapists (N=19) and AI chatbot therapists (N = 16). Simulated Cognitive Behavioural Therapy sessions with three AI generated client personas generated transcripts that were analysed using natural language processing methods to quantify prose length, sentiment and semantic similarity (Experiment 1). The transcripts were then blindly human rated by two samples with varied levels of experience with therapy (Experiments 2a and 2b). Relative to human-led sessions, AI therapists spoke more words, took fewer conversational turns, were more positive in their sentiment, and more closely matched the client’s sentiment and meaning. Skilled raters could reliably detect the AI and disliked the AI therapist, whereas unskilled raters could not reliably detect AI sessions and preferred AI to human-led sessions. Overall, these findings suggest that AI led sessions are quantifiably distinct from those led by humans, that users disliked AI-led sessions if they believed the therapist to be AI, but that only people with some experience of therapy could reliably distinguish AI from human-led sessions.