LabQAR: A Manually Curated Dataset for Question Answering on Laboratory Test Reference Ranges and Interpretation

Balu Bhasuran
Qiao Jin
Angelique Deville
Yonghui Wu
Karim Hanna
Zhiyong Lu
Zhe He

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Laboratory tests are crucial for diagnosing and managing health conditions, providing essential reference ranges for result interpretation. The diversity of lab tests, influenced by variables like the specimen type (e.g., blood, urine), gender, age-specific, and other influencing factors such as pregnancy, makes automated interpretation challenging. Automated clinical decision support systems attempting to interpret these values must account for such nuances to avoid misdiagnoses or incorrect clinical decisions. In this regard, we present LabQAR ( Lab oratory Q uestion A nswering with R eference Ranges), a manually curated dataset comprising 550 lab test reference ranges derived from authoritative medical sources, encompassing 363 unique lab tests and including multiple-choice questions with annotations on reference ranges, specimen types, and other factors impacting interpretation. We also assess the performance of several large language models (LLMs), including LLaMA 3.1, GatorTronGPT, GPT-3.5, GPT-4, and GPT-4o, in predicting reference ranges and classifying results as normal, low, or high. The findings indicate that GPT-4o outperforms other models, showcasing the potential of LLMs in clinical decision support.

Version published to 10.1101/2025.06.03.25328882 on medRxiv
Jun 3, 2025

Screenathon 2.0: Human–AI Collaborative Screening Applied to Patient-Generated Health Data

This article has 11 authors:
1. Jonas Bergmann
2. Tiago Azzi
3. Rutger Chris Neeleman
4. Kianush Monschau
5. Elena Jalsovec
6. Emily Westerbeek
7. Felix Weijdema
8. Jonathan de Bruin
9. Qixiang Fang
10. Rens van de Schoot
11. Berke Yazan
This article has no evaluationsLatest version Jan 9, 2026
How to Evaluate Medical AI

This article has 8 authors:
1. Ilia Kopanichuk
2. Petr Anokhin
3. Vladimir Shaposhnikov
4. Vladimir Makharev
5. Ekaterina Tsapieva
6. Iaroslav Bespalov
7. Dmitry Dylov
8. Ivan Oseledets
This article has no evaluationsLatest version Jan 22, 2026
Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

This article has 2 authors:
1. Mert Zure
2. Metin Sökmen
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Screenathon 2.0: Human–AI Collaborative Screening Applied to Patient-Generated Health Data

How to Evaluate Medical AI

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics