Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications

Matthew D. Turner
Abhishek Appaji
Nibras Ar Rakib
Pedram Golnari
Arcot K. Rajasekar
K V Anitha Rathnam
Satya S. Sahoo
Yue Wang
Lei Wang
Jessica A. Turner

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized “micro-benchmarks,” like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.

Version published to 10.1101/2025.05.13.653828v1 on bioRxiv
May 14, 2025

Improving CXR Report Labeling Through LLM Fine-Tuning and Human Feedback

This article has 1 author:
1. Donald Martin
This article has no evaluationsLatest version Apr 21, 2025
Automated Detection of Early-Stage Dementia Using Large Language Models: A Comparative Study on Narrative Speech

This article has 3 authors:
1. Kevin Mekulu
2. Faisal Aqlan
3. Hui Yang
This article has no evaluationsLatest version Jun 7, 2025
Detection of patient metadata in published articles for genomic epidemiology using machine learning and large language models

This article has 10 authors:
1. Ari Z. Klein
2. Davy Weissenbacher
3. Karen O’Connor
4. Amir Elyaderani
5. Ivan Flores Amaro
6. Takeshi Onishi
7. Su Golder
8. Kaelen Spiegel
9. Matthew Scotch
10. Graciela Gonzalez-Hernandez
This article has no evaluationsLatest version Apr 28, 2025

Listed in

Abstract

Article activity feed

Related articles

Improving CXR Report Labeling Through LLM Fine-Tuning and Human Feedback

Automated Detection of Early-Stage Dementia Using Large Language Models: A Comparative Study on Narrative Speech

Detection of patient metadata in published articles for genomic epidemiology using machine learning and large language models