Beyond Metrics to Methods: A Scoping Review of Large Language Models for Detection of Social Drivers of Health in Clinical Notes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
This scoping review aimed to map current applications of Large language models (LLMs) for extracting Social drivers of health (SDoH), benchmarks model performance across domains to define the state of the field, and evaluates methodological approaches to identify research gaps and guide clinical deployment.
Materials and Methods
We searched PubMed, Web of Science, Embase, Scopus, and IEEE Xplore for studies applying LLMs in the detection of SDoH. We applied a novel methodological framework integrating: (1) a hierarchical classification system for SDoH domains and LLM architectures; (2) a systematic approach for synthesizing performance metrics; and (3) a custom seven-domain instrument to assess the methodological rigor.
Results
Forty-two studies met inclusion criteria. Behavioral Factors had the highest median F1-score (0.87), while Health Care Access and Quality showed the lowest and most variability (median F1 = 0.59). Research was concentrated in the United States (85.7%) and private institutional datasets (69%), often focused on critical care populations (45.2%). Methodological assessment revealed that only 29% of studies provided annotation guidelines, 24% assessed fairness across demographic groups, and 21% validated models externally.
Discussion and Conclusion
The progress of using LLMs for SDoH extraction is limited by performance variability, weak methodological rigor in the conducted studies, and minimal attention given to fairness and generalizability. Methodological gaps include a lack of provided annotation guidelines, assessment of fairness, and external model validation. LLMs show strong potential for extracting SDoH from clinical text. However, to move forward, addressing the current limitations demands more standardized, transparent, and robust research.