Natural Language Processing for assessing multimorbidity: A systematic review
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Multimorbidity poses significant healthcare challenges globally. Current assessment methods rely primarily on structured electronic health record (EHR) data, potentially missing valuable information contained in unstructured clinical notes. Natural language processing (NLP) techniques offer promising solutions for extracting comprehensive multimorbidity data from these unstructured sources.
Objectives
To identify, characterize, and critically appraise studies utilizing NLP techniques for multimorbidity assessment from unstructured EHR data.
Methods
This systematic review will follow PRISMA-P guidelines and be conducted according to Cochrane and Joanna Briggs Institute methodologies. We will search multiple databases (PubMed, Web of Science, Embase, CINAHL, MEDLINE, Cochrane Library, PsycINFO, and Scopus) from inception to February 2025. Eligible studies will include adult populations with multimorbidity (≥2 chronic conditions) where NLP techniques were applied to unstructured EHR data and compared against reference standards. Two independent reviewers will screen studies, extract data, and assess methodological quality using PROBAST, CLAIM, TRIPOD, and QUADAS-2 tools as appropriate. Both quantitative synthesis (meta-analysis) and narrative synthesis will be considered based on study heterogeneity.
Outcomes
Primary outcomes include validity metrics of NLP-based multimorbidity assessment (sensitivity, specificity, positive predictive value, F1 score, AUROC). Secondary outcomes include reliability measures, generalizability assessments, efficiency metrics, and end-user perspectives.
Discussion
This review will establish the current state of evidence on NLP for multimorbidity assessment, identify best practices and challenges, and guide future research efforts in this emerging field. Findings will inform the development and implementation of improved methods for extracting multimorbidity information from unstructured clinical text, potentially enhancing risk stratification, care planning, and research for patients with multiple chronic conditions.