Tackling the problem of multi-class imbalanced classification on species distribution models using machine learning techniques
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-class imbalanced classification problem arises in ecology due to the uneven distribution of fish species in the dataset. Certain fish species may be more prevalent in the environment or have a higher frequency of occurrence data, resulting in a larger number of samples from specific habitats. In order to tackle this challenge within species distribution models, a combination of ensemble ML and resampling techniques such as random over-sampling were utilized. ML techniques such as random forests (RF), bagging classification trees (bagCART), gradient boosting models (GBM), extreme boosting models (XGB) and adaptive boosting models (AdaBoost) were evaluated for their performance on imbalanced and balanced fish species dataset. The F1-score and G-mean were used to choose the most suitable classifier for the dataset. The RF and bagCART classifiers exhibited strong F1-score and G-mean values, indicating their robust ability to generalize across minority classes and effectively manage resampling across different classes. A high G-mean value ensures that no classes are neglected. Hence, it is concluded that employing ensemble ML techniques with resampling techniques is necessary to effectively address the multi-class imbalanced classification problem in species distribution modeling. The strong performance shown by RF and bagCART is evidence that ensemble ML techniques can be used for predicting species habitat suitability.