Developing and Using Data Mining to Detect Healthcare Fraud and Abuse for Health Insurance Companies

Ali Malek Hasan
Ranwa Al-Yakzan
Bassel Alkhatib

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Healthcare fraud and abuse remain a major global challenge, costing billions annually and threatening the sustainability of insurance systems. Traditional manual and rule- based approaches are increasingly ineffective given the scale, complexity, and adaptability of fraudulent schemes. This study develops and validates a data-driven framework leveraging ma- chine learning (ML) to detect fraudulent health insurance claims. Using a large, publicly available dataset, we applied rigorous preprocessing, including feature engineering to create domain- specific predictors and SMOTE resampling applied strictly to the training set to prevent data leakage. Five supervised al- gorithms—Logistic Regression, Decision Tree, SVM, Random Forest, and XGBoost—were compared against a stacking en- semble that combined Random Forest, XGBoost, and Logistic Regression. Performance was evaluated through stratified 10- fold cross-validation using Accuracy, F1-score, and AUC-ROC. Results show that the ensemble model achieved the best and most balanced performance (F1 = 0.81, AUC = 0.95), significantly outperforming individual classifiers. Feature importance analysis further revealed that the model identified clinically meaningful fraud indicators, such as diagnostic diversity and provider claim frequency. These findings highlight the potential of scalable, open- source ML frameworks to strengthen fraud detection, reduce financial risks, and complement commercial detection tools with cost-effective, interpretable solutions.

Version published to 10.21203/rs.3.rs-7558249/v1 on Research Square
Sep 16, 2025

Mining Financial Data for Fraud Detection using Ensemble Learning and Outlier Detection

This article has 2 authors:
1. Manimegalai R
2. Vijayalaskhmi P
This article has no evaluationsLatest version Dec 10, 2025
Comparing Algorithm Effectiveness in Health Data Analysis

This article has 1 author:
1. Abdulmalik Hazaa Alshammari
This article has no evaluationsLatest version Jan 22, 2026
Heart Disease Detection with Machine Learning Algorithms

This article has 2 authors:
1. Fatemeh Hosseinabadi
2. Seyedhassan Sharifi
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Mining Financial Data for Fraud Detection using Ensemble Learning and Outlier Detection

Comparing Algorithm Effectiveness in Health Data Analysis

Heart Disease Detection with Machine Learning Algorithms