LabQAR: A Manually Curated Dataset for Question Answering on Laboratory Test Reference Ranges and Interpretation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Laboratory tests are crucial for diagnosing and managing health conditions, providing essential reference ranges for result interpretation. The diversity of lab tests, influenced by variables like the specimen type (e.g., blood, urine), gender, age-specific, and other influencing factors such as pregnancy, makes automated interpretation challenging. Automated clinical decision support systems attempting to interpret these values must account for such nuances to avoid misdiagnoses or incorrect clinical decisions. In this regard, we present LabQAR ( Lab oratory Q uestion A nswering with R eference Ranges), a manually curated dataset comprising 550 lab test reference ranges derived from authoritative medical sources, encompassing 363 unique lab tests and including multiple-choice questions with annotations on reference ranges, specimen types, and other factors impacting interpretation. We also assess the performance of several large language models (LLMs), including LLaMA 3.1, GatorTronGPT, GPT-3.5, GPT-4, and GPT-4o, in predicting reference ranges and classifying results as normal, low, or high. The findings indicate that GPT-4o outperforms other models, showcasing the potential of LLMs in clinical decision support.