Robust Deep Active Learning via Distance-Measured Data Mixing and Adversarial Training

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate uncertainty estimation in unlabeled data represents a fundamental challenge in active learning. Traditional deep active learning approaches suffer from a critical limitation: uncertainty-based selection strategies tend to concentrate excessively around noisy decision boundaries, while diversity-based methods may miss samples that are crucial for decision-making. This over-reliance on confidence metrics when employing deep neural networks as backbone architectures often results in suboptimal data selection. We introduce Distance-Measured Data Mixing (DM2), a novel framework that estimates sample uncertainty through distance-weighted data mixing to capture inter-sample relationships and the underlying data manifold structure. This approach enables informative sample selection across the entire data distribution while maintaining focus on near-boundary regions without overfitting to the most ambiguous instances. To address noise and instability issues inherent in boundary regions, we propose a boundary-aware feature fusion mechanism integrated with fast-gradient adversarial training. This technique generates adversarial counterparts of selected near-boundary samples and trains them jointly with the original instances, thereby enhancing model robustness and generalization capabilities under complex or imbalanced data conditions. Comprehensive experiments across diverse tasks, model architectures, and data modalities demonstrate that our approach consistently surpasses strong uncertainty-based and diversity-based baselines while significantly reducing the number of labeled samples required for effective learning.

Article activity feed