Empirical Review of LLM-driven Classification of Multidimensional Sleep Health Mentions from Free-Text Clinical Notes

Syed-Amad Hussain
Ariana Calloway
Joseph Sirrianni
Eric Fosler-Lussier
Mattina Davenport

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate multidimensional sleep health (MSH) information is often fragmented and inconsistently represented within hospital infrastructures, leaving crucial details buried in unstructured clinical notes rather than discrete fields. This inconsistency complicates large-scale phenotyping, secondary analyses, and clinical decision support regarding sleep-related outcomes. In this work, we systematically explore contemporary natural language processing techniques, prompt-based large language models (LLMs) and fine-tuned discriminative classifiers, to bridge this critical gap. We evaluate performance on extracting nine key MSH dimensions (timing, duration, efficiency, sleep disorders, daytime sleepiness, interventions, medication, behavior, and satisfaction) from clinical narratives using public datasets (MIMIC-III derivatives) and an internally annotated pediatric sleep corpus.

Initially, we assess generative LLM performance using dynamic few-shot prompting, analyzing impacts from varying prompt structures, example quantity, and domain-specificity without explicit task-specific fine-tuning. Subsequently, we fine-tune generative LLM architectures on both in-task and out-of-task data to quantify performance improvements and limitations. Lastly, we benchmark these generative approaches against encoder-based discriminative classifiers (ModernBERT), designed to directly estimate binary presence of each MSH class within full clinical notes.

Our experiments demonstrate that fine-tuned discriminative models consistently provide higher classification accuracy, lower inference latency, and more robust span-level identification than either prompted or fine-tuned generative LLMs, given adequate training data. Nonetheless, generative LLMs retain moderate utility in low-data scenarios. Importantly, our results highlight persistent challenges, including difficulty extracting subtle sleep constructs such as sleep efficiency and daytime sleepiness, and biases associated with patient demographics and clinical departments. We conclude by suggesting future research directions: refining span extraction methods, mitigating biases in model performance, and exploring advanced chain-of-thought prompting techniques to achieve reliable, scalable MSH phenotyping within real-world clinical systems.

Version published to 10.1101/2025.06.04.25328983v1 on medRxiv
Jun 5, 2025

From Keywords to Context: Bridging Expert Insight and Language Models for Multidimensional Sleep Health Classification in Clinical Notes

This article has 5 authors:
1. Syed-Amad Hussain
2. Ariana Calloway
3. Joseph W Sirrianni
4. Eric Fosler-Lussier
5. Mattina Davenport
This article has no evaluationsLatest version Jun 7, 2025
Automated Insomnia Phenotyping from Electronic Health Records: Leveraging Large Language Models to Decode Clinical Narratives

This article has 11 authors:
1. Guillermo Lopez-Garcia
2. Davy Weissenbacher
3. Matthew Stadler
4. Karen O’Connor
5. Dongfang Xu
6. Lauren Gryboski
7. Jared Heavens
8. Noor Abu-el-Rub
9. Diego R. Mazzotti
10. Subhajit Chakravorty
11. Graciela Gonzalez-Hernandez
This article has no evaluationsLatest version Jun 3, 2025
From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The sleepCare Platform

This article has 1 author:
1. Christos A. Frantzidis
This article has no evaluationsLatest version Jun 20, 2025

Listed in

Abstract

Article activity feed

Related articles

From Keywords to Context: Bridging Expert Insight and Language Models for Multidimensional Sleep Health Classification in Clinical Notes

Automated Insomnia Phenotyping from Electronic Health Records: Leveraging Large Language Models to Decode Clinical Narratives

From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The sleepCare Platform