Comparison of Process-Based and Machine Learning Models for Streamflow Simulation in Typical Basins in Northern and Southern China

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate streamflow forecasting is vital for sustainable water resource management but remains challenging due to pronounced spatiotemporal variability. This study evaluates two process-based models, the SWAT (comprehensive) and the GWLF (parsimonious), and a data-driven random forest (RF) model for monthly streamflow simulations in two contrasting Chinese basins: the humid southern basin (SSB) and the semi-arid northern basin (SRB). Using four statistical metrics (NSE, R2, MAE, RMSE), we assess model accuracy, robustness in capturing extremes, and sensitivity to hydrological characteristics and data availability. The results reveal consistently superior performance in the SSB across all models, with SWAT demonstrating the highest overall accuracy—especially for peak flows—due to its physically based structure. The GWLF provides acceptable simulations with minimal data requirements, offering a practical alternative in data-limited regions, like the SRB. RF performs well in the SSB under zero-lag conditions but requires hydrologically informed lag structures in the SRB. However, it consistently underestimates high flows due to its lack of physical constraints. The findings underscore that model selection must, therefore, be guided not only by predictive performance but also by the underlying hydrological context, data availability, and the need for physical realism in decision-making.

Article activity feed