Key-Activated Generative Security: Enforcing Access Control in Image Captioning Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
While the rapid progress of deep learning has endowed image captioning systems with remarkable generative power, it has simultaneously exposed them to unprecedented risks of intellectual property (IP) leakage and illicit replication. Conventional defense methods—most notably watermarking—have proven suitable for discriminative classifiers but remain inadequate for captioning networks, where outputs are structured, semantic, and human-readable. Such watermarking strategies generally function as passive verification tools, leaving them prone to removal or circumvention and unable to actively prevent unauthorized model usage. In this work, we introduce \textbf{SKIC}, a novel security framework that embeds secret-key conditions into the recurrent memory dynamics of captioning models. By intertwining a cryptographic-style key with internal state transitions, SKIC enforces that only executions with valid keys produce coherent captions; forged or incorrect keys collapse the generation process. This paradigm fundamentally repositions ownership protection from retrospective verification to proactive access control. Comprehensive experiments on MS-COCO and Flickr30k datasets show that SKIC achieves two complementary goals: maintaining indistinguishable caption quality under authorized use while delivering complete functional breakdown under illegitimate keys. To the best of our knowledge, SKIC is the first mechanism that integrates secret-key-based security directly into generative captioning systems, establishing a new standard for safeguarding neural IP assets.