Opening the ‘black box’ of the silent phase evaluation for artificial intelligence: a scoping review and critical analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

‘Silent’ evaluation refers to the prospective, non-interventional testing of artificial intelligence (AI) model performance in the intended clinical setting without affecting patient care or institutional operations. The silent evaluation phase has received less attention than in silico algorithm development or formal clinical evaluations, despite increasing recognition of this type of evaluation as a critical phase in an effective translation process for healthcare AI tools. There are currently no formal guidelines for conducting silent AI evaluations in health settings. We undertook a scoping review to identify silent AI evaluations described in the literature, aiming to summarize current practices for the conduct of silent evaluations. We screened PubMed, Web of Science, and Scopus databases for articles fitting our criteria for AI silent evaluations, or ‘silent trials’, published from 2015 to 2025. A total of 570 articles were identified, and 55 met the criteria for inclusion into the final review. We found wide variance in terminology, description, and rationale for silent evaluations; this led to substantial heterogeneity in what was reported. Overwhelmingly, papers reported measurement of AUC, precision/recall, positive and negative predictive values and similar technical performance metrics. Far fewer studies reported the verification of outputs against an in-situ clinical ground truth, and, when reported, the comprehensiveness of such verification was highly variable. We noted a large gap in descriptions of sociotechnical components such as stakeholder engagement and human-computer interaction elements. We conclude that these gaps mirror challenges in effective translation of AI tools from “lab to bedside” and identify opportunities to improve silent evaluation protocols that address key translational needs. This is important as healthcare organizations and regulatory bodies worldwide seek guidance for gathering meaningful evidence of the impact of AI tools on clinical practice.

Article activity feed