Computational Review of Technology-Assisted Medical Evidence Synthesis through Human-LLM Collaboration: A Case Study of Cochrane
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Medical evidence synthesis, typically done by systematic reviews, requires extensive manual effort across stages such as searching, screening, extraction, and synthesis, making them slow and costly. These limitations hinder timely updates and rapid responses during health crises. Interests in technology-assisted evidence synthesis have been increasing, driven by artificial intelligence (AI) and large language models (LLMs). In 2024, four major networks including the Cochrane, Campbell Collaboration, Joanna Briggs Institute and Collaboration for Environmental Evidence jointly launched an AI Methods Group to advance automation in evidence synthesis. This chapter presents a large-scale computational analysis of technology-assisted MES across 7,271 Cochrane reviews (2010– 2024), identifying computer tools—software, packages, or algorithmic implementations—used at different review stages via an LLM-human collaborative annotation pipeline. A multi-LLM mechanism combining suggestion, verification, and self-critical questioning achieved high-recall tool extraction. Evaluation against five “gold-standard” tool lists showed major gains: approximately 100 additional tools were identified compared to each existing review-based, database-based, and Cochrane-curated gold standards. Eventually, a list of in total 514 tools was compiled. Two annotators verified all candidates within two days, demonstrating notable efficiency. A follow-up bibliometric analysis provides the first computational map of technology use in Cochrane evidence synthesis, revealing trends across time, domains, and regions.