Application of Machine Learning Model in Fraud Identification: A Comparative Study of CatBoost, XGBoost and LightGBM

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In the digital age, credit card fraud seriously affects financial stability and consumer trust, and machine learning technology provides a new way for its detection. In this study, the performance of three machine learning models, CatBoost, XGBoost and LightGBM, in credit card fraud detection is compared and analyzed by using a credit card transaction data set containing more than 1.85 million records. Through hierarchical K-fold cross-validation, with F1 score, accuracy rate and recall rate as evaluation indicators, the results show that the comprehensive performance of CatBoost model is the best, with F1 score of 0.9161, which is excellent in balancing and accurately identifying fraudulent transactions and detecting all fraudulent transactions. At the same time, the top 10 important characteristics that affect the model prediction are determined, such as transaction amount, cardholder age, urban population, etc. These characteristics provide a key basis for constructing and optimizing the fraud detection model. Based on the research results, it is suggested that financial institutions optimize the model with CatBoost as the core, expand the data feature dimension and strengthen real-time monitoring to improve the accuracy and timeliness of fraud detection. This study provides a valuable reference for the field of credit card fraud detection and helps to promote the development of financial security technology.

Article activity feed