Evaluating the Readability and Quality of AI-Generated Scoliosis Education Materials: A Comparative Analysis of Five Language Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Introduction: Accurate and comprehensible health information is essential for medical decision-making, yet AI-generated health content varies in readability and quality. In adolescent idiopathic scoliosis (AIS), where treatment decisions depend on complex factors, the reliance on AI-generated materials raises concerns about accuracy and accessibility. This study evaluates the readability and quality of AI-generated scoliosis education materials to assess their effectiveness in improving health literacy. Methods: Five AI models (ChatGPT-4o, ChatGPT-o1, ChatGPT-o3 mini-high, DeepSeek-V3, DeepSeek-R1) were tested on three scoliosis-related inquiries. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and Reading Ease Score (FRES), while content quality was evaluated using the DISCERN score. Statistical analyses were performed in R-Studio. Results: DeepSeek-R1 achieved the lowest FKGL (6.2) and the highest FRES (64.5), indicating superior readability. In contrast, ChatGPT-o1 and ChatGPT-o3 mini-high scored above FKGL 12.0, requiring college-level reading skills. Despite readability differences, DISCERN scores remained stable across models (~50.5), suggesting comparable content quality. However, all responses lacked citations, limiting reliability. Conclusion: AI-generated scoliosis education materials vary significantly in readability, with DeepSeek-R1 being the most accessible. Future AI models should enhance readability without compromising information accuracy and integrate real-time citation mechanisms for improved trustworthiness.

Article activity feed