A Model Combining CTGAN-Based Outlier Detection Mechanism with Ensemble Learning to Mitigate Type II Errors in Diabetes Detection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In the field of machine learning for diabetes detection, outliers in datasets remain a significant challenge. Traditional outlier handling methods often fall short in terms of accuracy and are prone to Type II errors. Moreover, these conventional approaches typically discard outliers, leading to inefficient data utilization. To address these limitations, this study aims to develop a more effective unsupervised outlier detection mechanism by integrating Conditional Generative Adversarial Networks (CTGAN) with Autoencoders. We further introduce a secondary outlier detection layer based on the Outlier Factor to enhance detection accuracy and reduce Type II errors. Additionally, we incorporate this mechanism into an ensemble learning framework and propose a novel training method for base learners that retains rather than discards outliers. The resulting model architecture is capable of simultaneously performing outlier detection and diabetes classification tasks. Our method demonstrates exceptional performance on eight outlier detection datasets and three diabetes classification datasets. Ablation studies confirm that the proposed dual outlier detection mechanism effectively mitigates Type II errors. Experimental results show that, compared to traditional methods, the proposed approach achieves significant improvements in outlier detection accuracy, reduction of Type II errors, and enhanced data utilization efficiency for diabetes detection models.

Article activity feed