Accelerating metadata annotation in collaborative research centers: A hybrid AI workflow for biomedical entities

Manuel Watter
Felix Engel
Aref Kalantari
Claudia Giuliani
Karin Schuller
Claus-Werner Franzke
Markus Sperandio
Harald Binder
Klaus Kaier

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Collaborative Research Centers rely on FAIR-compliant, richly structured metadata, yet manual annotation is a major bottleneck. We implemented an AI- and search-augmented large language model (LLM) workflow within a local research data management system to pre-annotate biomedical entities, using human-in-the-loop verification to ensure data quality. Methods The pipeline uses Gemini 3.0 Pro for a two-step prompting strategy: (1) identify dataset deposits and stable identifiers in articles converted to Markdown; (2) extract structured fields from curated repository landing pages rendered via a headless browser. To respect a highly hierarchical metadata schema, we flattened the schema for prompting and remapped outputs to strict JSON, with granular provenance tags. Authors received pre-filled metadata and could accept, edit, or delete entries (TP, FP, FN mapping). Performance metrics (precision, recall, F1) were estimated as proportions and synthesized via random-effects meta-analysis. The workflow was rolled out in December 2025 with reminders at 5 and 10 weeks. Results Among 51 screened articles (40 original articles, 11 review articles), the LLM identified a repository deposit in 31 articles; authors responded for 17 of these (55%), yielding 39 datasets with human verification. On the 39 verified datasets, the number of true positives averaged 13.15 (SD 4.57; range 6–27). False positives were rare, with a mean of 0.23 (SD 0.58; range 0–2). False negatives were also low, with a mean of 1.46 (SD 1.93; range 0–6). Precision was consistently high across datasets with an overall random-effects estimate of 99.65% (95% CI 98.42% to 100.00%) and no detectable heterogeneity (I² = 0.00%). Recall showed more variability, with an overall estimate of 93.75% (95% CI 89.79% to 96.96%) and moderate heterogeneity (I² = 55.08%). The combined performance, expressed as the F1 score, yielded an overall estimate of 96.17% (95% CI 93.78% to 98.11%). Conclusions The hybrid workflow achieved very high precision with moderately variable recall, effectively shifting effort from drafting to reviewing while preserving schema compliance. However, the modest author response rate limits sample size and generalizability; broader engagement and multi-site validation are needed to confirm robustness across domains.

Version published to 10.21203/rs.3.rs-9231981/v1 on Research Square
Mar 30, 2026

Annotix: An Integrated Desktop Platform for Multi-Modal Data Annotation, Collaborative Labeling, and End-to-End Machine Learning Training

This article has 8 authors:
1. Nicolás Baier Quezada
2. Vanessa Uribe Hernández
3. Haydeé Barrientos Toledo
4. Cristina Vargas Bustamante
5. Martin Arrigo Figueroa
6. Aaron Mancilla Leiva
7. Felipe Brana Peña
8. Fernanda López-Moncada
This article has no evaluationsLatest version Apr 14, 2026
Benchmarking MeSH-Augmented Embeddings for Biomedical Document Similarity

This article has 6 authors:
1. Rohitha Ravinder
2. Lukas Geist
3. Nelson Quiñones
4. Suhasini Venkatesh
5. Leyla Jael Castro
6. Dietrich Rebholz-Schuhmann
This article has no evaluationsLatest version Apr 13, 2026
Improving package annotation in metabolomics and proteomics via robust, ontology-driven LLM integration

This article has 8 authors:
1. Sebastian Lobentanzer
2. Helge Hecht
3. Vincent J Carey
4. Maria A Doyle
5. Alban Gaignard
6. Hervé MENAGER
7. Júlia Mir
8. Claire Rioualen
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Annotix: An Integrated Desktop Platform for Multi-Modal Data Annotation, Collaborative Labeling, and End-to-End Machine Learning Training

Benchmarking MeSH-Augmented Embeddings for Biomedical Document Similarity

Improving package annotation in metabolomics and proteomics via robust, ontology-driven LLM integration