From Keywords to Context: Bridging Expert Insight and Language Models for Multidimensional Sleep Health Classification in Clinical Notes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate detection of multidimensional sleep health (MSH) information from electronic health records (EHRs) is critical for improving clinical decision-making but remains challenging due to sparse documentation and class imbalance. This study investigates whether integrating expert-guided annotations and keyword-based heuristics with large language models (LLMs) enhances the extraction of nuanced MSH indicators from clinical narratives. Using a novel, expertly annotated dataset (NCH-Sleep), we trained and evaluated models to classify clinical notes across nine clinically relevant MSH categories. Our baseline model demonstrated substantial predictive capability using raw text alone. Incorporating manually annotated spans (oracle annotations) dramatically improved performance, highlighting the benefit of targeted expert guidance. Additionally, employing curated keyword annotations within varying context windows significantly enhanced model interpretability while retaining strong predictive accuracy. Through detailed bias analyses, we identified consistent performance across demographics and clinical settings, although specific disparities underscored the importance of balanced expert oversight. Our findings emphasize the value of expert-informed supervision and heuristic approaches in building scalable, interpretable clinical NLP systems for sleep health classification.