Evaluating Large Language Models for Translating Multimodal Phenotype Documentations into Executable EHR Phenotyping Algorithms
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Research applications of electronic health record (EHR) phenotypes require translating clinical definitions into executable EHR database queries, a labor-intensive process. We evaluated two frontier large language models across five phenotypes and three documentation modalities. Both models captured high-level logic from structured text but degraded markedly with diagram-only input. Error analysis revealed seven failure categories. Documentation, rather than model capability, was the primary bottleneck, reinforcing the need for standardization and expert oversight.