Validation of Synthesa AI, a Large Language Model-Based Screening Tool for Systematic Reviews: Results from Nine Studies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Systematic review screening is often burdensome, prone to human error, and requires significant manual effort. Synthesa AI, a large language model (LLM)-based tool, was developed to address these challenges by offering a transparent and prompt-driven approach to abstract screening. In this validation study, Synthesa AI was evaluated across 17 benchmark meta-analyses encompassing nine clinical domains. Using user-defined PICOS criteria, the tool screened a total of 270,626 abstracts retrieved from PubMed and Scopus. Synthesa AI accurately identified all 163 benchmark-included studies, yielding a sensitivity of 100% and a pooled specificity of 99.4%. Remarkably, it reduced reviewer workload by 91.7%, flagging only 1,797 abstracts for manual review. Furthermore, the tool identified 32 relevant studies that had been missed in the original reviews, representing a 19.6% increase in evidence yield. These findings demonstrate that Synthesa AI delivers high precision, efficiency, and reproducibility in systematic review workflows. Its auditable and deterministic architecture adheres to Good Machine Learning Practice (GMLP) guidelines, making it suitable for both academic and regulatory applications. Synthesa AI represents a promising solution for living systematic reviews and large-scale evidence synthesis initiatives, offering a transformative alternative to traditional human-led screening.