Early Experiments with Generative AI in a Digital Therapeutic for IBS: Impact on Retention, Therapeutic Alliance, Efficacy, and User Satisfaction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Digital therapeutics (DTx) offer scalable access to evidence-based care but are constrained by limited personalization, low adherence, and weaker therapeutic alliance compared with in-person care. Nerva, a validated evidence-based brain-gut therapy app for irritable bowel syndrome (IBS), provides a platform to test whether artificial intelligence (AI) can address these barriers.Methods: We conducted two naturalistic experiments and one alliance assessment in 2025. Study 1 used a quasi-experimental design comparing users exposed to AI coaching (initial n=1,910; analyzed n=424) with historical controls (initial n=31,370; analyzed n=16,945). Study 2 was a randomized trial (n = 12,264) comparing standard care (Control) with improved sound quality (No Variety) and daily AI-generated visualization scripts (Variety). Finally, 154 users exposed to AI coaching completed the Session Rating Scale (SRS) to assess therapeutic alliance. The AI-coaching SRS mean score (34.6/40) approached outpatient psychotherapy benchmarks (33–37).Results:Study 1: n=16,945 control; n=424 novel program after excluding missing data: After adjusting for age, gender, and 10 baseline symptom measures, the treatment group completed 0.36 more days on average (95% CI: 0.16-0.57, p=0.0006). Daily retention showed significant improvements on Days 2, 3, 5, 6, and 7, with the strongest effect on Day 5 (OR=1.38, 95% CI: 1.13-1.69, p=0.0017).Study 2: Both intervention arms demonstrated higher engagement than Control (No Variety +33.6%; Variety +30.8%), with more users active after Day 14 (Control 55.2%; No Variety 70.2%; Variety 70.7%). Neither intervention altered symptom outcomes, indicating personalization did not compromise efficacy. Net Promoter Score (NPS) was highest in the Variety arm (48.47 vs 40.9 and 25.4 in No Variety and Control, respectively). Conclusions: Embedding generative AI into a validated DTx improved early retention and satisfaction without reducing baseline therapeutic impact. These findings suggest that AI-enabled personalization can safely expand user choice and engagement. These early findings further demonstrate the feasibility of AI-enhanced digital therapeutics and suggest a path toward “AI therapeutics” that combine scalability with personalization and relational depth.