An LLM-based Multi-Agent Collaborative Approach for Abstract Screening towards Automated Systematic Reviews

Opeoluwa Akinseloyin
Xiaorui Jiang
Vasile Palade

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective

Systematic reviews (SRs) are essential for evidence-based practice but remain labor-intensive, especially during title and abstract screening. This study evaluates whether multiple large language model (multi-LLM) collaboration can improve screening prioritization while reducing costs.

Methods

Abstract screening was framed as a question-answering (QA) task using cost-effective LLMs. Three multi-LLM collaboration strategies were proposed and evaluated, including majority voting by averaging opinions of peers, multi-agent debate (MAD) for answer refinement, and LLM-based adjudication against answers of individual QA baselines. These strategies were evaluated on the CLEF eHealth 2019 Technology-Assisted Review benchmark using standard performance metrics in the domain, including Mean Average Precision (MAP), Recall@k%, and Work Saved over Sampling (WSS).

Results

Multi-LLM collaboration significantly outperformed QA baselines. Majority voting was the best collaboration strategy, achieving the highest MAP 0.4621 and 0.3409 on the subsets of SRs about clinical intervention and diagnostic technology assessment, respectively, with WSS@95% 0.6064 and 0.6798, enabling in theory up to 68% workload reduction at 95% recall of all included studies. MAD improved weaker models the most. The adjudicator-as-a-ranker method surpassed adjudicator-as-a-judge and was the second strongest approach, but with a significantly higher cost than majority voting and debating.

Conclusion

Multi-LLM collaboration can substantially improve abstract screening efficiency, and the success lies in model diversity. Making the best use of diversity, majority voting stands out in terms of both excellent performance and low cost compared to adjudication. Despite context-dependent gains and diminishing model diversity, MAD is still a cost-effective strategy and a potential direction of further research.

Version published to 10.1101/2025.08.11.25333429 on medRxiv
Aug 14, 2025

Automated Summarization of Software Documents: An LLM-based Multi-Agent Approach

This article has 5 authors:
1. Duc S. H. Nguyen
2. Minh T. Nguyen
3. Phuong T. Nguyen
4. Juri Di Rocco
5. Davide Di Ruscio
This article has no evaluationsLatest version Sep 17, 2025
CCoRe: Cooperative-Competitive Reasoning LLM-based Multi-Agent Framework

This article has 4 authors:
1. Hicham Bouchtib
2. Kaouter Karboub
3. Mohamed Tabaa
4. Mohamed Hamlich
This article has no evaluationsLatest version Aug 13, 2025
Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks

This article has 5 authors:
1. Shuyue Jia
2. Subhrangshu Bit
3. Varuna H. Jasodanand
4. Yi Liu
5. Vijaya B Kolachalama
This article has no evaluationsLatest version Aug 8, 2025

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusion

Article activity feed

Related articles

Automated Summarization of Software Documents: An LLM-based Multi-Agent Approach

CCoRe: Cooperative-Competitive Reasoning LLM-based Multi-Agent Framework

Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks