Evaluating the Readability and Quality of AI-Generated Scoliosis Education Materials: A Comparative Analysis of Five Language Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction: Accurate and comprehensible health information is essential for medical decision-making, yet AI-generated health content varies in readability and quality. In adolescent idiopathic scoliosis (AIS), where treatment decisions depend on complex factors, the reliance on AI-generated materials raises concerns about accuracy and accessibility. This study evaluates the readability and quality of AI-generated scoliosis education materials to assess their effectiveness in improving health literacy. Methods: Five AI models (ChatGPT-4o, ChatGPT-o1, ChatGPT-o3 mini-high, DeepSeek-V3, DeepSeek-R1) were tested on three scoliosis-related inquiries. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and Reading Ease Score (FRES), while content quality was evaluated using the DISCERN score. Statistical analyses were performed in R-Studio. Results: DeepSeek-R1 achieved the lowest FKGL (6.2) and the highest FRES (64.5), indicating superior readability. In contrast, ChatGPT-o1 and ChatGPT-o3 mini-high scored above FKGL 12.0, requiring college-level reading skills. Despite readability differences, DISCERN scores remained stable across models (~50.5), suggesting comparable content quality. However, all responses lacked citations, limiting reliability. Conclusion: AI-generated scoliosis education materials vary significantly in readability, with DeepSeek-R1 being the most accessible. Future AI models should enhance readability without compromising information accuracy and integrate real-time citation mechanisms for improved trustworthiness.