Outlier detection for mixed data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Outlier detection is crucial in various sectors such as finance, insurance, medicine, and IT security. Its application helps to identify suspicious behaviors and enhance the robustness of statistical models. However, a common challenge arises when dealing with data that includes both numerical and categorical attributes, as most traditional outlier detection methods are applicable only to numerical data. To overcome this limitation, this study proposes to apply the Factor Analysis on Mixed Data (FAMD) method to transform both types of attributes into numerical Principal Components (PCs). Traditional outlier detection methods are then applied to these PCs. The proposed method is evaluated on classic datasets from the supervised classification literature and two simulated data contaminated by different types of outliers : (a) global outliers which significantly deviate from most data points, (b) local outliers which are not necessarily extreme values but are considered abnormal within their specific context or neighborhood, (c) rare outliers which have unexpected categories compared to the typical data distribution, and (d) mixed outliers which can be both global and rare, or local and rare.