Evaluating a Custom Chatbot in Undergraduate Medical Education: Randomised Crossover Mixed-Methods Evaluation of Performance, Utility, and Perceptions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: While LLM chatbots are gaining popularity in medical education, their pedagogical impact remains under-evaluated. This study examined the effects of a domain-specific chatbot on performance, perception, and cognitive engagement among medical students. Methods: Twenty first-year medical students completed two academic tasks using either a custom-built educational chatbot (Lenny AI by qVault) or conventional study methods in a randomised, crossover design. Performance was assessed through Single Best Answer (SBA) questions, while post-task surveys (Likert scales) and focus groups were employed to explore user perceptions. Statistical tests compared performance and perception metrics; qualitative data underwent thematic analysis with independent coding (κ = 0.403–0.633). Results: Participants rated the chatbot significantly higher than conventional resources for ease of use, satisfaction, engagement, perceived quality, and clarity (p < 0.05). Lenny AI use was positively correlated with perceived efficiency and confidence, but showed no significant performance gains. Thematic analysis revealed accelerated factual retrieval but limited support for higher-level cognitive reasoning. Students expressed high functional trust but raised concerns about transparency. Conclusions: The custom chatbot improved usability; effects on deeper learning were not detected within the tasks studied. Future designs should support adaptive scaffolding, transparent sourcing, and critical engagement to improve educational value.