From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The sleepCare Platform

Christos A. Frantzidis

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background/Objectives: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through unstructured narratives in clinical notes, online forums, and telehealth platforms. This study proposes a machine learning pipeline (sleepCare) that classifies sleep-related narratives into clinically meaningful categories, including stress-related, neurodegenerative, and breathing-related disorders. The proposed framework employs natural language processing (NLP) and machine learning techniques to support remote applications and real-time patient monitoring, offering a scalable solution for the early identification of sleep disturbances. Methods: The sleepCare consists of a three-tiered classification pipeline to analyze narrative sleep reports. First, a baseline model used a Multinomial Naïve Bayes classifier with n-gram features from a Bag-of-Words representation. Next, a Support Vector Machine (SVM) was trained on GloVe-based word embeddings to capture semantic context. Finally, a transformer-based model (BERT) was fine-tuned to extract contextual embeddings, using the [CLS] token as input for SVM classification. Each model was evaluated using stratified train-test splits and 10-fold cross-validation. Hyperparameter tuning via GridSearchCV optimized performance. The dataset contained 475 labeled sleep narratives, classified into five etiological categories relevant for clinical interpretation. Results: The transformer-based model utilizing BERT embeddings and an optimized Support Vector Machine classifier achieved an overall accuracy of 81% on the test set. Class-wise F1-scores ranged from 0.72 to 0.91, with the highest performance observed in classifying normal or improved sleep (F1 = 0.91). The macro average F1-score was 0.78, indicating balanced performance across all categories. GridSearchCV identified the optimal SVM parameters (C = 4, kernel = ‘rbf’, gamma = 0.01, degree = 2, class_weight = ‘balanced’). The confusion matrix revealed robust classification with limited misclassifications, particularly between overlapping symptom categories such as stress-related and neurodegenerative sleep disturbances. Conclusions: Unlike generic large language model applications, our approach emphasizes the personalized identification of sleep symptomatology through targeted classification of the narrative input. By integrating structured learning with contextual embeddings, the framework offers a clinically meaningful, scalable solution for early detection and differentiation of sleep disorders in diverse, real-world, and remote settings.

Version published to 10.3390/brainsci15070667
Jun 20, 2025
Version published to 10.20944/preprints202506.0711.v1
Jun 10, 2025

Empirical Review of LLM-driven Classification of Multidimensional Sleep Health Mentions from Free-Text Clinical Notes

This article has 5 authors:
1. Syed-Amad Hussain
2. Ariana Calloway
3. Joseph Sirrianni
4. Eric Fosler-Lussier
5. Mattina Davenport
This article has no evaluationsLatest version Jun 5, 2025
Development of a Rule-Based Natural Language Processing Algorithm to Extract Sleep Information in Pediatric Primary Care Patients with a Sleep Diagnosis

This article has 9 authors:
1. Joseph W. Sirrianni
2. Ariana Calloway
3. Syed-Amad Hussain
4. Deena Chisolm
5. Kelly Kelleher
6. Azizi Seixas
7. Hongfang Liu
8. Christopher Bartlett
9. Mattina A. Davenport
This article has no evaluationsLatest version Jun 1, 2025
Data-driven discovery of core sleep biomarkers for predicting early cardiometabolic risk in a healthy population using machine learning

This article has 1 author:
1. Zeren Yu
This article has no evaluationsLatest version Jun 16, 2025

Listed in

Abstract

Article activity feed

Related articles

Empirical Review of LLM-driven Classification of Multidimensional Sleep Health Mentions from Free-Text Clinical Notes

Development of a Rule-Based Natural Language Processing Algorithm to Extract Sleep Information in Pediatric Primary Care Patients with a Sleep Diagnosis

Data-driven discovery of core sleep biomarkers for predicting early cardiometabolic risk in a healthy population using machine learning