An Experimental Study of Semi-supervised Clustering Performance in Presence of Imbalanced and Noisy Constraints

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Semi-supervised clustering incorporates prior knowledge such as class labels or pairwise constraints into classical clustering methods to obtain better quality clustering result. While many literature publications made great effort to develop novel and efficient semi-supervised clustering variants, the impact of imbalanced constraint set and noisy constraint set has not been fully explored. In this article, we simulate the realistic settings where the input is noisy, that means, the constraint sets are skewed or contain noisy constraints. We analyze the robustness and accuracy of six state-of-the-art semi-supervised clustering algorithms, and highlight the scenarios in which each approach is more suitable for. The experimental results on prominent UCI benchmark datasets demonstrate that most semi-supervised clustering approaches benefit more from must-link constraints than from cannot-link constraints. Moreover, pure cannot-link constraint set sometimes leads to a decrease in performance. We also find that semi-supervised clustering approaches do not have good robustness properties against noisy constraints, especially noisy must-link constraints.

Article activity feed