Screenathon 2.0: Human–AI Collaborative Screening Applied to Patient-Generated Health Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Systematic reviews are essential for evidence-based research, yet the traditional screening process is time-consuming and difficult to scale. Human-only screening can introduce inconsistency, while fully automated approaches employing Large Language Models often lack the contextual judgement required for complex decisions. To address this, we introduce a crowd-based screening methodology that integrates human expertise with adaptive machine learning. The methods have been applied in the context of a large EU project where experts from 27 collaborating partners jointly screened 5,842 papers across eleven disease topics related to patient-generated health data in a span of only 2 days. Post-processing played a central role in ensuring data quality, including topic reallocation, targeted full-text verification, and noisy-label filtering. This Screenathon resulted in 487 records being labeled as relevant and 6,463 records as irrelevant. The number of records screened per participant ranged from 3 to 2,496, with a mean of 216.4 records per screener (SE = 95.19). Survey results indicated increased trust in AI-assisted systematic reviewing after the event, along with generally positive evaluations of usability. The current Screenathon demonstrates that human–AI collaboration can increase efficiency while maintaining rigor, provided that workflows include thoughtful training and calibration together with strong post-processing safeguards.

Article activity feed