Practical Black-box Watermark Removal via Knowledge Distillation into Compact Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

As deep neural networks continue to grow in scale and deployment scope, protecting model intellectual property has become increasingly important. Black-box watermarking techniques embed hidden ownership signals via trigger inputs, enabling verification without access to model internals. However, existing works typically evaluate extraction attacks under a rather optimistic assumption: the attacker’s surrogate model is at least as large as, or even larger than, the victim model, and watermarks that survive in this setting are deemed secure. Our study shows that the surrogate model’s capacity has a strong impact on watermark retention. When the surrogate model is smaller than the victim model, its limited capacity often fails to preserve the embedded watermark, even if main-task performance remains high. Motivated by this observation, we propose the \emph{Capacity Exploited Watermark Removal Attack} (CEWRA), a black-box watermark removal framework that leverages knowledge distillation into deliberately low-capacity neural architectures. By reducing model depth and parameter count, CEWRA disrupts the representation subspace used to encode watermark signals while preserving essential features for the primary task. We evaluate CEWRA on three benchmark datasets and three state-of-the-art black-box watermarking schemes—EWE (USENIX 2021), MEA (S\&P 2024), and SSW (ACM MM 2023). On CIFAR-100, CEWRA reduces the Watermark Success Rate to $0\%$ for EWE and the robust SSW-S variant, and to $21.8\%$ for SSW-P, while keeping the accuracy drop on the primary task within $2.5\%$. Compared to existing removal techniques, CEWRA shows superior robustness and generalizability across architectures such as ResNet and VGG, revealing a capacity-related vulnerability in current black-box watermarking strategies and underscoring the need for capacity-aware IP protection under realistic extraction scenarios.

Article activity feed