A Resource-Efficient Approach to Text-Conditional Chest X-ray Generation Using Latent Diffusion Models

Priyam Deepak Choksi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Current medical image generation models typically require substantial computational resources, creating practical barriers for many research institutions. Recent diffusion models achieve notable results but demand multiple high-end GPUs and large datasets, limiting accessibility and reproducibility in medical AI research.Methods: We present a resource-efficient latent diffusion model for text-conditional chest X-ray generation, trained on a single NVIDIA RTX 4060 GPU using the Indiana University Chest X-ray dataset (3,301 frontal images). Our architecture combines a Variational Autoencoder (VAE) with 3.25M parameters and 8 latent channels, a U-Net denoising network with 39.66M parameters incorporating cross-attention mechanisms, and a BioBERT text encoder fine-tuned with parameter-efficient methods (593K trainable from 108.9M total parameters). We employ optimization strategies including gradient checkpointing, mixed precision training, and gradient accumulation to enable training within 8GB VRAM constraints.Results: The model achieves a validation loss of 0.0221 after 387 epochs of diffusion training, with the VAE converging at epoch 67. Inference time averages 663ms per 256×256 image on the RTX 4060, enabling real-time generation. Total training time was approximately 96 hours compared to 552+ hours reported for comparable multi-GPU models. The system successfully generates anatomically plausible chest X-rays conditioned on clinical text descriptions including various pathological findings.Conclusions: Our work demonstrates that effective medical image generation does not require massive computational resources. By achieving functional results with a single consumer GPU and limited data, we provide a practical pathway for medical AI research in resource-constrained settings. All code, model weights, and training configurations are publicly available at https://github.com/priyam-choksi/cxr-diffusion to facilitate reproducibility and further research.

Version published to 10.20944/preprints202506.1783.v1
Jun 23, 2025

RETRACTED: Large-scale Chest Disease Diagnosis Enabled by Multimodal LargeLanguage Models with Self-Supervised Fine-Tuning

This article has 6 authors:
1. Si-Qi Li
2. Hong-Yan Zhang
3. Cheng Zhen
4. Bai-Qiang Li
5. Jun-Gang zhao
6. Jing-Shan Huang
This article has no evaluationsLatest version Apr 30, 2025
Deep Learning with Transfer Learning for Detecting Abnormalities in X-ray Images

This article has 5 authors:
1. Dina Jasim Abd
2. Sabah Abdulazeez Jebur
3. Lafta Raheem Ali
4. Riyam M. Alsammarraie
5. Abir Jaafar Hussain
This article has no evaluationsLatest version Jun 2, 2025
Enhanced Lung Cancer Diagnosis using Deep Neural Networks

This article has 1 author:
1. Yaswanth Rangu
This article has no evaluationsLatest version May 7, 2025

Listed in

Abstract

Article activity feed

Related articles

RETRACTED: Large-scale Chest Disease Diagnosis Enabled by Multimodal LargeLanguage Models with Self-Supervised Fine-Tuning

Deep Learning with Transfer Learning for Detecting Abnormalities in X-ray Images

Enhanced Lung Cancer Diagnosis using Deep Neural Networks