Generating learning guides for medical education with LLMs and statistical analysis of test results

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

The Progress Test Medizin (PTM) is a formative test for medical students issued twice a year by the Charité-Universitätsmedizin Berlin. The PTM provides a numerical feedback based on a global view of the strengths and weaknesses of students. This feedback can benefit from more fine-grained information, pinpointing the topics where students need to improve, as well as advice on what they should learn in light of their results. The scale of the PTM, taken by more than 10,000 participants every academic semester, makes it necessary to automate this task.

Methods

We have developed a seven-step approach based on large language models and statistical analysis to fulfil the purpose of this study. Firstly, a large language model (ChatGPT 4.0) identified keywords in the form of MeSH terms from all 200 questions of one PTM run. These keywords were checked against the list of medical terms included in the Medical Subject Headings (MeSH) thesaurus published by the National Library of Medicine (NLM). Meanwhile, answer patterns of PTM questions were also analysed to find empirical relationships between questions. With this information, we obtained series of questions related to specific MeSH terms and used them to develop a framework that allowed us to assess the performance of PTM participants and compose personalized feedback structured around a curated list of medical topics.

Results

We used data from a past PTM to simulate the generation of personalized feedback for 1,401 test participants, thereby producing specific information about their knowledge regarding a number of topics ranging from 34 to 243. Substantial knowledge gaps were found in 14.67% to 21.76% of rated learning topics, depending on the benchmarking set considered.

Conclusion

We designed and tested a method to generate student feedback covering up to 243 medical topics defined by MeSH terms. The feedback generated with data from students in later stages of their studies was more detailed, as they tend to face more questions matching their knowledge level.

Article activity feed