Loss Function Matters More Than Framework: A Comparative Study of Gradient Boosting Robustness to Outliers

Mikhail Ulyanin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present a systematic empirical study comparing the robustness of four major tree-based ensemble algorithms — XGBoost, LightGBM, CatBoost, and Random Forest — to controlled training data contamination. Unlike prior work that compares frameworks as monolithic units, we test multiple loss functions (MSE, Huber, MAE) within each boosting framework, yielding 13 regression and 5 classification configurations. Experiments on California Housing, Kaggle House Prices, and Adult Census Income datasets at contamination levels 0-40% reveal that loss function choice affects robustness radically more than framework choice. Within-framework retention index spread averages 0.63, roughly three times the between-framework spread of 0.51. LightGBM with MAE loss retains 96.6% of R2 at 40% label noise, while the same framework with MSE loss retains only 26.6%. Random Forest ranks only 8th out of 12 configurations. We provide theoretical justification through influence function analysis, report anomalous collapse of Huber loss when miscalibrated, and propose a retention index for standardized robustness comparison. For classification under symmetric label noise, CatBoost achieves the highest MCC retention (71.1%), significantly outperforming Random Forest (60.7%).

Version published to 10.21203/rs.3.rs-9158378/v1 on Research Square
Mar 25, 2026

Differentially Private Lasso: An ISTA Framework with Finite-Iteration Guarantees

This article has 2 authors:
1. Jiahui Zhang
2. Chi Seng Pun
This article has no evaluationsLatest version Mar 24, 2026
Functional Trust Regions (FTR): A Lagrangian Framework for Stability-Constrained Continual Learning

This article has 2 authors:
1. Kavya Bhand
2. Aadi Joshi
This article has no evaluationsLatest version Mar 25, 2026
Comparative Analysis of Classical and Quantum Runge-Kutta Methods in Classification Tasks

This article has 1 author:
1. Janakkumar Patel
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Differentially Private Lasso: An ISTA Framework with Finite-Iteration Guarantees

Functional Trust Regions (FTR): A Lagrangian Framework for Stability-Constrained Continual Learning

Comparative Analysis of Classical and Quantum Runge-Kutta Methods in Classification Tasks