MetaBeeAI: an AI pipeline for full-text systematic reviews in biology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The volume and complexity of scientific literature are expanding rapidly, making it increasingly difficult to extract and synthesise information across studies. This chal-lenge is particularly evident in the biological sciences, where data span from molecules to ecosystems and linking across hierarchical levels is inherently challenging. Large Language Model (LLM) pipelines offer a scalable solution, but most lack modular-ity, transparency, and mechanisms for human oversight. We present MetaBeeAI, an open-source, modular pipeline that uses LLMs to extract structured information from scientific papers for systematic review and meta-analysis. The system includes an intuitive interface that displays model outputs alongside the source text, allowing users to inspect, correct, and iteratively improve performance. MetaBeeAI produces an auditable, machine-readable record of prompts, configuration settings, and expert annotations, supporting reliable replication and continual refinement. We evaluated the pipeline on 924 research papers, extracting information on bee species, pesticides, exposure methodologies, and other environmental factors. Results highlight the value of expert-in-the-loop validation for prompt optimisation and show that MetaBeeAI can handle heterogeneous experimental designs and biological contexts. MetaBeeAI provides a general framework for structured knowledge extraction, enabling scalable, transparent, and reproducible evidence synthesis, and new approaches to accelerate discovery in the life sciences using AI.