Evaluating a Custom Chatbot in Undergraduate Medical Education: Randomised Crossover Mixed-Methods Evaluation of Performance, Utility, and Perceptions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: While LLM chatbots are gaining popularity in medical education, their pedagogical impact remains under-evaluated. This study examined the effects of a domain-specific chatbot on performance, perception, and cognitive engagement among medical students. Methods: Twenty first-year medical students completed two academic tasks using either a custom-built educational chatbot (Lenny AI by qVault) or conventional study methods in a randomised, crossover design. Performance was assessed through Single Best Answer (SBA) questions, while post-task surveys (Likert scales) and focus groups were employed to explore user perceptions. Statistical tests compared performance and perception metrics; qualitative data underwent thematic analysis with independent coding (κ = 0.403–0.633). Results: Participants rated the chatbot significantly higher than conventional resources for ease of use, satisfaction, engagement, perceived quality, and clarity (p < 0.05). Lenny AI use was positively correlated with perceived efficiency and confidence, but showed no significant performance gains. Thematic analysis revealed accelerated factual retrieval but limited support for higher-level cognitive reasoning. Students expressed high functional trust but raised concerns about transparency. Conclusions: The custom chatbot improved usability; effects on deeper learning were not detected within the tasks studied. Future designs should support adaptive scaffolding, transparent sourcing, and critical engagement to improve educational value.

Article activity feed