Evaluating Loon Lens Pro™, an AI-Driven Tool for Full-Text Screening in Systematic Reviews: A Validation Study

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Systematic literature reviews (SLRs) are essential for evidence synthesis but are hampered by the resource-intensive full-text screening phase. Loon Lens Pro™, a publicly available agentic AI tool, automates full-text screening without prior training by using user-defined inclusion/exclusion criteria and multiple specialized AI agents. This study validated Loon Lens Pro™ against human reviewers to assess its accuracy, efficiency, and confidence scoring in screening.

Methods

In this comparative validation study, 84 full-text articles from eight SLRs were screened by both Loon Lens Pro™ and human reviewers (gold standard). The AI provided binary inclusion/exclusion decisions along with a transparent rationale and confidence ratings (low, medium, high). Performance metrics— including accuracy, sensitivity, specificity, negative predictive value, precision, and F1 score—were derived from a confusion matrix. Logistic regression with bootstrap resampling (1,000 iterations) evaluated the association between confidence scores and screening errors.

Results

Loon Lens Pro™ correctly classified 70 of 84 full texts, achieving an accuracy of 83.3% (95% CI: 75.0– 90.5%), sensitivity of 94.7% (95% CI: 82.4–100%), and specificity of 80.0% (95% CI: 70.1–89.2%). The negative predictive value was 98.1% (95% CI: 93.8–100%), with a precision of 58.1% (95% CI: 41.4– 76.0%) and an F1 score of 0.72. Logistic regression revealed a strong inverse relationship between confidence level and error probability: low, medium, and high confidence decisions were associated with predicted error probabilities of 46.9%, 30.9%, and 3.5%, respectively (C-index = 0.87).

Conclusion

Our study provides evidence that Loon Lens Pro™ is a viable and effective tool for automating the full-text screening phase of systematic reviews. Its high sensitivity, robust confidence scoring mechanism, and transparent rationale generation collectively support its potential to alleviate the burden of manual screening without compromising the quality of study selection.

Article activity feed