Leveraging Large Language Models to Construct Feedback from Medical Multiple-Choice Questions

Mihaela Tomova
Iván Roselló Atanet
Victoria Sehy
Miriam Sieg
Maren März
Patrick Mäder

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Exams like the formative Progress Test Medizin can enhance their effectiveness by offering feedback beyond numerical scores. Content-based feedback, which encompasses relevant information from exam questions, can be valuable for students by offering them insight into their performance on the current exam, as well as serving as study aids and tools for revision.Our goal was to utilize Large Language Models (LLMs) in preparing content-based feedback for the Progress Test Medizin and evaluate their effectiveness in this task.We utilize two popular LLMs and conduct a comparative assessment by performing textual similarity on the generated outputs. Furthermore, we study via a survey how medical practitioners and medical educators assess the capabilities of LLMs and perceive the usage of LLMs for the task of generating content-based feedback for PTM exams.Our findings show that both examined LLMs performed similarly. Both have their own advantages and disadvantages. Our survey results indicate that one LLM produces slightly better outputs; however, this comes at a cost since it is a paid service, while the other is free to use. Overall, medical practitioners and educators who participated in the survey find the generated feedback relevant and useful, and they are open to using LLMs for such tasks in the future.We conclude that while the content-based feedback generated by the LLM may not be perfect, it nevertheless can be considered a valuable addition to the numerical feedback currently provided.

Version published to 10.21203/rs.3.rs-4706589/v1 on Research Square
Aug 1, 2024

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

This article has 2 authors:
1. Mert Zure
2. Metin Sökmen
This article has no evaluationsLatest version Jan 21, 2026
LLM Aspect Prediction: Reviewing Academic Papers from Different Aspects with Large Language Model

This article has 3 authors:
1. Zihao Hu
2. Fumiyo Fukumoto
3. Dongjin Yu
This article has no evaluationsLatest version Dec 11, 2025
Understanding the Impact of Dataset Characteristics on RAG-based Multi-hop QA Performance

This article has 3 authors:
1. Nimet Aksoy
2. Zekeriya Anıl Güven
3. Murat Osman Ünalır
This article has no evaluationsLatest version Dec 12, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

LLM Aspect Prediction: Reviewing Academic Papers from Different Aspects with Large Language Model

Understanding the Impact of Dataset Characteristics on RAG-based Multi-hop QA Performance