Deep Learning in Clinical Diagnostics: A Scoping Review of Innovations Shaping Future Healthcare Delivery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Deep learning (DL) based diagnostic systems potentially offer automated image/signal interpretation and workflow support across a wide range of clinical fields, but the evidence for clinical translation is mixed. We conducted a scoping review to identify validation methods, evidence for implementation, and common methodological/operational challenges in current DL-based diagnostic research. Methods Based on PRISMA-ScR guidelines, we searched key bibliographic databases for peer-reviewed articles (2020–2025) describing DL models for diagnostic tasks producing quantitative results. Two reviewers independently screened the records and extracted the study characteristics into a standardized data extraction form (Author, year, country, domain, task, sample size, model type, comparator, metrics, validation method, setting, key findings, reported challenges). We rated each study according to its furthest advanced stage of translation (development, external validation, prospective testing, randomized trial, post-deployment). No formal risk of bias assessment was conducted. Results Twenty-four studies met the inclusion criteria across radiology (breast imaging, chest x-ray), gastroenterology (colonoscopy CADe), ophthalmology (retinal screening/prognostics), dermatology (dermoscopy), and cardiology (echocardiography/ECG). Mapping of the translation stage was found: prospective clinical testing n = 8 (33.3%), randomized trials n = 5 (20.8%), post-deployment/real-world auditing n = 4 (16.7%), external validation n = 2 (8.3%), and development/retrospective studies n = 5 (20.8%). High-performing examples from prospective or rollout studies included higher cancer detection in screening mammography, randomized evidence of higher adenoma detection with CADe, large-scale TB CXR screening at AUCs > 0.98 with workload reductions up to ~ 80%, and echocardiography automation comparable to expert metrics. Frequently cited issues were a lack of geographic/demographic diversity, spectrum bias, inconsistent external validation, underreporting of clinically relevant operating points and calibration, gaps in explainability, barriers to integration with workflow, and variable regulatory/COI transparency. Conclusion DL diagnostic platforms have attained clinical-utility evidence in many applications (screening mammography, colonoscopy CADe, programmatic CXR screening) where prospective trials or deployments are available. That said, safe widespread adoption would require standardized external validation, prospective outcome studies, and analyses of equity-focused subgroups, routine post-deployment monitoring, and transparent reporting of thresholds and governance

Article activity feed