Enhancing Medical Anomaly Detection via Text-Adapted Few-Shot Learning with Visual-Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Medical image anomaly detection (AD) is crucial for early disease diagnosis, yet it faces challenges such as data heterogeneity and scarcity of annotated samples. This paper introduces a text-adapted few-shot training framework using CLIP, which extends the text encoder to incorporate fine-grained descriptions and introduces a text feature adapter for better alignment with image representations. A text-image feature alignment module and a contrastive learning mechanism are presented to enhance cross-modal integration and the distinction between normal and abnormal samples. Experimental evaluations on six medical imaging datasets demonstrate that our method significantly outperforms state-of-the-art techniques in both classification and segmentation tasks, achieving an average improvement of 1.13% in AUC. The implementation code is available at https://github.com/clownddd/TAFT.

Article activity feed