Hydrological modeling in a highly urbanized watershed using explainable machine learning and sub-hourly data: A case study in the city of Sao Paulo, Brazil

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Hydrological modeling of urbanized watersheds is a highly challenging task due to the complexity and non-linearity of the rainfall-runoff relationship in these areas. Many data-driven models have been proposed in the literature to address this problem. However, in this field, there is a need not only for performance but also for explainability and comprehension of the impacts of hydrometeorological factors. This study proposes a detailed comparative analysis between ensemble machine learning models using an explainable framework. We explore feature engineering and feature selection techniques to determine the best set of predictors in a situation of non-continuous data, a common problem in real-world scenarios. Among the models analysed, CatBoost stood out as the best-performing algorithm for most cases, and, in general, all the ensemble algorithms achieved good performance for a forecasting horizon up to 3 hours. A study with SHAP values revealed insightful aspects of the spatial and temporal dynamics of the rainfall-runoff relationship.

Article activity feed