Outlier detection for mixed data

Houda GADACHA
Patricia Kubicki
Ndeye Niang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Outlier detection is crucial in various sectors such as finance, insurance, medicine, and IT security. Its application helps to identify suspicious behaviors and enhance the robustness of statistical models. However, a common challenge arises when dealing with data that includes both numerical and categorical attributes, as most traditional outlier detection methods are applicable only to numerical data. To overcome this limitation, this study proposes to apply the Factor Analysis on Mixed Data (FAMD) method to transform both types of attributes into numerical Principal Components (PCs). Traditional outlier detection methods are then applied to these PCs. The proposed method is evaluated on classic datasets from the supervised classification literature and two simulated data contaminated by different types of outliers : (a) global outliers which significantly deviate from most data points, (b) local outliers which are not necessarily extreme values but are considered abnormal within their specific context or neighborhood, (c) rare outliers which have unexpected categories compared to the typical data distribution, and (d) mixed outliers which can be both global and rare, or local and rare.

Version published to 10.21203/rs.3.rs-5019199/v1 on Research Square
Oct 4, 2024

Robust explicit estimators of the Rayleigh distribution under Type II censoring

This article has 3 authors:
1. Li Luo
2. Zhuanzhuan Ma
3. Min Wang
This article has no evaluationsLatest version Apr 21, 2025
A Model Combining CTGAN-Based Outlier Detection Mechanism with Ensemble Learning to Mitigate Type II Errors in Diabetes Detection

This article has 7 authors:
1. Dongxiang Liu
2. Zhanfei Ma
3. Xuebao Li
4. Bisheng Wang
5. Jing Jiang
6. HaoYe Luo
7. Hui Wei
This article has no evaluationsLatest version Mar 28, 2025
A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

This article has 3 authors:
1. Dingyi Nie
2. Yixing Wu
3. C.-C. Jay Kuo
This article has no evaluationsLatest version Mar 27, 2025

Listed in

Abstract

Article activity feed

Related articles

Robust explicit estimators of the Rayleigh distribution under Type II censoring

A Model Combining CTGAN-Based Outlier Detection Mechanism with Ensemble Learning to Mitigate Type II Errors in Diabetes Detection

A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations