Benchmarking of Ensembles and Meta‐Ensembles in the Multiclass Classification of Obesity Risk: Predictive Performance, Calibration and Interpretability

Daniel Andrade-Girón
William Marin-Rodriguez
Américo Peña
Elsa Oscuvilca-Tapia
Fredy Bermejo-Sanchez

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Obesity represents a significant public health concern, attributable to its high preva-lence and its association with cardiometabolic comorbidities. This study compared a set of ensemble learning models—including canonical ensembles, meta-ensembles, and base-lines for tabular data—in a multiclass obesity status prediction task using the “Obesity Dataset” (n = 1,610; 14 predictors; 4 classes). To ensure methodological rigor, a pipeline was implemented using ColumnTransformer, standardization, one-hot encoding, and re-balancing via SMOTENC applied exclusively to the training folds, thereby preventing data leakage. The performance of the system was evaluated using several evaluation metrics, including accuracy, F1-score, precision, recall, Cohen's kappa, and Matthews correlation coefficient. This evaluation was supplemented by a computational cost analysis. Inferen-tial comparisons were executed using the Friedman test and the Nemenyi post-hoc test (α = 0.05). The findings indicated a high level of overall performance (≈89–90.5% precision), identifying a leading group of models that were statistically indistinguishable (Group A). This group included LightGBM (90.49% ± 1.38), Random Forest (90.16% ± 1.70), Stacking (90.21% ± 1.70), and Extra Trees (89.69% ± 1.55). It has been demonstrated that models such as XGBoost, Bagging, and CatBoost demonstrate competitive performance with par-tial statistical overlap. Conversely, Gradient Boosting and AdaBoost exhibited signifi-cantly lower performance. In summary, a single dominant model was not identified; ra-ther, a set of equivalent solutions was identified. The selection of a model should be based on a balance between accuracy, computational cost, and interpretability. Random Forest and Extra Trees are efficient options, and Stacking is a valid alternative when maximizing predictive performance is prioritized.

Version published to 10.20944/preprints202604.0697.v1
Apr 10, 2026

A Machine Learning–Driven Health Risk Index for Predicting Chronic Disease Burden

This article has 1 author:
1. Ved Sharma
This article has no evaluationsLatest version Apr 2, 2026
Evaluating the Predictive Accuracy of Deep Learning Algorithms for Postoperative Mortality in Cardiac Surgery: A Systematic Review and Meta-Analysis

This article has 9 authors:
1. Ibrahim Ibrahim Shuaibu
2. Ahmad Yaseen Al Mahmoud
3. Ibrahem Aaroud
4. Abdalsalam Rizq Abazid
5. Mohamed Helmy Mohamed Abdelsalaam
6. Numaira Naeem Gazge
7. Mazen Mohammed Saad Alabed
8. Shahd Eltayeb
9. Sobhan Pahlavan Zadeh
This article has no evaluationsLatest version Mar 31, 2026
Design-Aware Predictive and Causal Modeling of Cardiovascular Risk in Chronic Kidney Disease Using Penalized and Double Machine Learning Approaches

This article has 3 authors:
1. Fernando Rojas
2. Axa Tapia
3. Hilda Espinoza
This article has no evaluationsLatest version May 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Machine Learning–Driven Health Risk Index for Predicting Chronic Disease Burden

Evaluating the Predictive Accuracy of Deep Learning Algorithms for Postoperative Mortality in Cardiac Surgery: A Systematic Review and Meta-Analysis

Design-Aware Predictive and Causal Modeling of Cardiovascular Risk in Chronic Kidney Disease Using Penalized and Double Machine Learning Approaches