Hierarchical Prompt Composition for Memory-Efficient Open-World Continual Learning in Vision-Language Foundation Models

REBBAH SIHAM

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Foundation models pre-trained on web-scale data, such as CLIP, exhibit strong zero-shot visual recognition capabilities. However, their deployment in open-world scenarios is constrained by catastrophic forgetting and an inability to efficiently incorporate novel concepts without full retraining. This paper introduces the Hierarchical Prompt Composition Network (HPC-Net), a memory-efficient architecture that enables vision-language models to learn incrementally in open environments. HPC-Net maintains a dynamically evolving hierarchy of learnable prompt components that are composed to form task-specific representations while preserving the model's foundational zero-shot capabilities. The architecture exploits the hierarchical compositionality of visual concepts through a three-tier prompt decomposition: (1) foundational prompts encoding broad semantic primitives, (2) compositional prompts for mid-level visual patterns, and (3) instance prompts for category-specific features. A Semantic Prototype Anchoring mechanism is introduced to prevent semantic drift in the shared prompt space, and a Contrastive Prompt Routing module dynamically selects and combines prompts for each input. Extensive experiments across four open-world benchmarks (Split-CIFAR100, Split-ImageNet-R, CORe50, and a new medical imaging benchmark, MedStream-7k) demonstrate that HPC-Net achieves an average accuracy of $84.3 \pm 0.9\%$, a $5.4\%$ absolute improvement over the strongest baseline. This is accomplished while retaining $98.4\%$ of the base model's zero-shot performance on seen domains and requiring only $2.1$M additional parameters (11.6x fewer than adapter-based fusion methods). All code, datasets, and pre-trained models will be released to facilitate reproducibility.

Version published to 10.21203/rs.3.rs-7970696/v1 on Research Square
Oct 30, 2025

Accelerating Small Language Model via Quantization: A GPT-4 Guided Approach for Low-Resource Story Completion

This article has 1 author:
1. Rakshit Dabral
This article has no evaluationsLatest version Oct 22, 2025
Improving Large Language Models with Concept-Aware Fine-Tuning

This article has 5 authors:
1. Dacheng Tao
2. Michael Chen
3. Xikun ZHANG
4. Jiaxing Huang
5. Yingjie Wang
This article has no evaluationsLatest version Oct 1, 2025
CLGRPO: Reasoning Ability Enhancement for Small VLMs

This article has 7 authors:
1. Fanyi Wang
2. Bingzhi Dong
3. Weijie Zou
4. Haotian Hu
5. Jinjin Xu
6. Chongyang Wang
7. Zhiwang Zhang
This article has no evaluationsLatest version Oct 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Accelerating Small Language Model via Quantization: A GPT-4 Guided Approach for Low-Resource Story Completion

Improving Large Language Models with Concept-Aware Fine-Tuning

CLGRPO: Reasoning Ability Enhancement for Small VLMs