A Robust Classifier for Label Noise Using Random Forest Kernel
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Many real-world classification datasets suffer from the presence of label noise that negatively impacts classification performance. A prominent group of label noise detection methods is nearest-neighbor-based, which tends to unfairly punish clean near-boundary samples. To alleviate this issue, we propose a new noise detection method that uses the distance function implicitly defined by randomized tree ensembles to find the nearest neighbors. These ensembles have been shown to be robust in the face of label noise, an important property that is exploited by the proposed detector method. In the first phase of our investigation, we analyze the characteristics and demonstrate the effectiveness of this noise detector. Next, we compare several ways of integrating the noise likelihood estimates obtained in the first phase with existing classification algorithms. Lastly, we propose a novel noise-robust variant of random forest that significantly outperforms regular random forest in the presence of high level of label noise and is competitive with robust state-of-the-art classification algorithms across a number of benchmark datasets.