Benchmarking of Ensembles and Meta‐Ensembles in the Multiclass Classification of Obesity Risk: Predictive Performance, Calibration and Interpretability

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Obesity represents a significant public health concern, attributable to its high preva-lence and its association with cardiometabolic comorbidities. This study compared a set of ensemble learning models—including canonical ensembles, meta-ensembles, and base-lines for tabular data—in a multiclass obesity status prediction task using the “Obesity Dataset” (n = 1,610; 14 predictors; 4 classes). To ensure methodological rigor, a pipeline was implemented using ColumnTransformer, standardization, one-hot encoding, and re-balancing via SMOTENC applied exclusively to the training folds, thereby preventing data leakage. The performance of the system was evaluated using several evaluation metrics, including accuracy, F1-score, precision, recall, Cohen's kappa, and Matthews correlation coefficient. This evaluation was supplemented by a computational cost analysis. Inferen-tial comparisons were executed using the Friedman test and the Nemenyi post-hoc test (α = 0.05). The findings indicated a high level of overall performance (≈89–90.5% precision), identifying a leading group of models that were statistically indistinguishable (Group A). This group included LightGBM (90.49% ± 1.38), Random Forest (90.16% ± 1.70), Stacking (90.21% ± 1.70), and Extra Trees (89.69% ± 1.55). It has been demonstrated that models such as XGBoost, Bagging, and CatBoost demonstrate competitive performance with par-tial statistical overlap. Conversely, Gradient Boosting and AdaBoost exhibited signifi-cantly lower performance. In summary, a single dominant model was not identified; ra-ther, a set of equivalent solutions was identified. The selection of a model should be based on a balance between accuracy, computational cost, and interpretability. Random Forest and Extra Trees are efficient options, and Stacking is a valid alternative when maximizing predictive performance is prioritized.

Article activity feed