Modeling Shortcut Deviation in Structured Representation Space for Reliable Neural Prediction

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Shortcut learning often leads to representation shift in neural networks, undermining their generalization capabilities. To address this issue, we propose a novel framework—Feature Deviation in Structured Representation (FDSR)—which explicitly models the deviation induced by shortcut functions within the structured representation space. By employing feature–function response mapping, FDSR isolates sub-representations influenced by spurious shortcut paths and accurately characterizes their interference with model predictions. On bias-controlled benchmarks such as Waterbirds and HANS, we develop an information-theoretic metric, the Feature Deviation Score (FDS), and further introduce a Task-Specific Manifold Projection method to reconstruct task-aligned representation spaces. Empirical results demonstrate a strong negative correlation between FDS and prediction accuracy on textual tasks (Pearson’s r = –0.74, p < 0.001), validating FDS as a reliable measure of shortcut-induced representation shift. In image classification tasks, models trained with the proposed FDSR framework achieve an average improvement of 13.2% ± 2.1% in out-of-distribution (OOD) accuracy. These findings provide both theoretical insights and practical tools for mitigating shortcut learning and enhancing the robustness of neural models

Article activity feed