Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of Artificial Intelligence (AI) for medical applications, specially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, VQ-GAN) ‘have been used to generate images, but not colonoscopy data for intelligent data augmentation. This study developed an effective method for producing synthetic colonoscopy image data, which can be used to train advanced medical diagnostic models for robust colorectal cancer detection and treatment. Text-to-image synthesis was performed using fine-tuned Visual LLMs. Stable Diffusion and DreamBooth Low Rank Adaptation produce images that look authentic, with an average Inception score of 2.36 across three datasets. The validation accuracy of various classification models– BiT, FixResNeXt, and EfficientNet– were 92%, 91%, and 86%, respectively. ViT and DeiT had an accuracy rate of 93%. Secondly, for the segmentation of polyps, the ground truth masks are generated using SAM. Then, five segmentation models (U-Net, PSNet, FPN, LinkNet, and MANet) were adopted. FPN produced excellent results, with an IoU of 0.64, an F1-score of 0.72, a recall of 0.87, and a Dice Coefficient of 0.72. This highlights how AI-generated medical images can improve colonoscopy analysis, which is critical for early colorectal cancer detection.