Comorbidity classification from clinical free-text using large language models: application to sleep disorder patients
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Patients presenting to neurology clinics commonly have a complex history of comorbidities and partially documented health trajectories, making it essential to reliably extract comorbidity information from historical records. However, existing extraction methods, ranging from rule-based systems to classical machine learning (ML), often fall short in accuracy, scalability, or adaptability across diverse document types.In this study, we present a large language model (LLM)-based framework for comorbidity extraction from diagnostic texts, capable of handling various prompt formats and textual sources such as patient history, prior diagnoses, and structured sleep assessments. The fine-tuned Mistral-24B (Instruct-2501) model achieves 95% macro classification accuracy and 92% F1 measure across six common classes of comorbidities, substantially outperforming prior state-of-the-art approaches. The proposed method extracts comorbidities through a transparent hierarchical approach, thereby supporting clinical analysis and providing interpretable insights for disease modeling and personalized treatment planning in sleep medicine.