Automated Insomnia Phenotyping from Electronic Health Records: Leveraging Large Language Models to Decode Clinical Narratives

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Insomnia is a highly prevalent but often underdiagnosed condition in clinical practice. Its inconsistent documentation in electronic health records (EHRs) limits population-level analyses and obstructs efforts to evaluate treatment patterns or outcomes. We present a novel, fully automated approach for phenotyping insomnia directly from unstructured clinical notes using generative large language models (LLMs). Leveraging prompt engineering with few-shot learning and chain-of-thought reasoning, we evaluated our system on two distinct corpora: inpatient clinical notes from MIMIC-III and outpatient primary care notes from the University of Kansas Health System (KUMC). Our models—Llama 70B and Llama 405B—achieved F1 scores of 93.0 on the MIMIC corpus and 85.7 on the KUMC corpus, substantially outperforming domain-adapted BERT-based classifiers. Ultimately, our framework offers a scalable and interpretable solution for clinical phenotyping of insomnia and can serve as a blueprint for similar efforts targeting other underdiagnosed or under-documented conditions in the EHR.

Article activity feed