SHIT: A Negative Adaptive Attention Model in Few Shot Learning Capability Named APA

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present Adaptive Prototype Attention (APA), a task-aware, prototype-guided, and multi-scale attention mechanism tailored for few-shot learning with Transformer-style archi tectures. APA (i) modulates attention weights with task context, (ii) injects prototype-conditioned signals to enhance within-class cohesion and between-class separation, and (iii) aggregates local and global dependencies across multiple scales. In controlled few shot classification experiments (5-way, 5-shot, synthetic episodes), APA consistently underperforms strong baselines. Compared with standard attention, APA decreases accuracy from 0.425 to 0.208 and macro-F1 from 0.419 to 0.084; relative to prototype only and multiscale-only variants, APA achieves accuracy drops of 0.205 and 0.232, respectively. APA converges within ∼921.7 epochs with a final loss ≈ 0.0000, indicating slow optimization; attention visualizations exhibit non-compact, task-agnostic pat terns (all experimental results are from the user-provided run logs). These findings suggest that the coupling of task-aware modulation with prototype guidance and multi-scale aggregation in the current APA design is ineffective for data-scarce regimes, and provide a practical warning for attention mechanism design in few-shot learning.

Article activity feed