Artificial Intelligence in Systematic Reviews: Overcoming Reproducibility, Bias and Validation Challenges
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Artificial Intelligence (AI) is rapidly changing how systematic reviews are conducted by accelerating the processes of literature retrieval and screening. While these ad-vancements enhance researchers’ productivity, the complete scope of AI's transforma-tive potential is still emerging. Moreover, issues related to reproducibility, bias, and transparency pose significant barriers to fully integrating AI into evidence synthesis. Large language models and machine learning classifiers show high sensitivity but suf-fer from low specificity, generating excessive false positives that increase the screening burden rather than reducing it. AI-generated Boolean search strategies often lack sta-bility, frequently delivering inconsistent results for the same prompts, which under-mines the core principle of reproducibility. Furthermore, AI models can sometimes "hallucinate," a term used to describe instances where the AI generates false or mis-leading information. They may also misapply Medical Subject Headings (MeSH) and introduce selection bias, ultimately distorting the outcomes of systematic reviews. This review examines the role of AI in systematic searching and presents a structured vali-dation framework to address these limitations. Establishing standardized benchmarks for reproducibility, managing sensitivity and specificity trade-offs, and developing clear explanatory mechanisms are crucial to ensure that AI is a complementary tool in evidence synthesis, rather than a disruptive force. Retrieval-augmented AI search frameworks can improve precision but require transparent decision-making processes to enhance trust and accountability. Hybrid AI-human workflows, where AI acceler-ates screening but human experts validate outputs, offer a pragmatic solution to bal-ance efficiency with methodological rigor. This review presents a comprehensive roadmap emphasizing the importance of interpretability, transparency reports, and ethical oversight to facilitate the responsible integration of AI into systematic reviews. Achieving reproducibility and reducing bias is critical for transforming AI from an experimental tool into a more reliable asset for evidence synthesis.