From Images to Reports: The Future of Deep Learning in Radiology Report Generation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing workload of radiologists, coupled with the growing volume of medical imaging data, has necessitated the development of automated solutions for radiology report generation. Deep learning has emerged as a promising approach for generating structured and accurate radiology reports by leveraging medical imaging data and natural language processing (NLP) techniques. This systematic review provides a comprehensive analysis of deep learning-based research on radiology report generation, covering key datasets, model architectures, evaluation metrics, challenges, and future directions. A critical component of this review is the discussion of publicly available datasets, such as MIMIC-CXR, IU-XRay, and CheXpert, which have been widely used to train and evaluate deep learning models. These datasets provide valuable radiology image-text pairs, enabling researchers to develop AI-driven reporting systems. However, challenges such as data scarcity, domain-specific variability, and privacy concerns limit the generalizability of existing models. From a methodological perspective, recent advances in deep learning have significantly enhanced the performance of radiology report generation models. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) serve as backbone architectures for medical image feature extraction, while natural language generation is handled by advanced transformer-based language models such as BERT, GPT, and T5. Additionally, multimodal approaches, which integrate visual and textual representations, have been increasingly adopted to improve the coherence and clinical accuracy of generated reports. Self-supervised and few-shot learning techniques have also emerged as potential solutions to the problem of data scarcity, enabling models to learn meaningful representations from limited labeled data. Evaluation of radiology report generation remains a complex and challenging task. Standard NLP metrics such as BLEU, ROUGE, METEOR, and CIDEr are commonly used to assess the linguistic quality of generated reports. However, these metrics fail to capture the clinical correctness of findings, leading to the introduction of alternative evaluation techniques such as RadGraph-based clinical metrics and expert radiologist assessments. The need for clinically relevant evaluation frameworks is crucial to ensuring that AI-generated reports align with radiological best practices and do not introduce critical diagnostic errors. Despite substantial progress, several challenges hinder the widespread adoption of deep learning-based radiology report generation in clinical practice. The problem of hallucinated findings—where AI models generate clinically incorrect information—poses significant risks. Additionally, the black-box nature of deep learning models raises concerns regarding interpretability and trustworthiness, limiting their acceptance among medical professionals. Ethical and regulatory challenges, such as accountability, bias mitigation, and compliance with data privacy laws (e.g., HIPAA, GDPR), further complicate the deployment of automated radiology reporting systems. To address these challenges, future research must focus on enhancing model robustness, improving interpretability, and ensuring clinical validation through real-world trials. Potential directions include the integration of domain-specific knowledge through medical ontologies, development of human-in-the-loop AI systems where radiologists collaborate with AI-generated report drafts, and the adoption of explainable AI (XAI) techniques to enhance transparency. Furthermore, expanding dataset diversity and establishing standardized reporting frameworks will be critical for developing AI systems that generalize across different institutions and patient demographics. In conclusion, deep learning-based radiology report generation presents a transformative opportunity to enhance diagnostic workflows, reduce reporting workload, and improve patient care. However, the successful deployment of AI-driven systems in clinical settings requires addressing significant technical, ethical, and regulatory challenges. By advancing model interpretability, incorporating multimodal learning, and fostering collaboration between AI researchers and radiologists, the field can move toward the development of reliable and clinically meaningful radiology report generation systems. This systematic review aims to provide a foundation for future research and facilitate the safe and effective integration of AI-driven solutions in radiology.