Does More Data Always Help? Input Configuration Impacts on LSTM-based Water Level Prediction

Shanshan Li
Zhaoli Wang
Bensheng Huang
Daoyi Chen
Liangxiong Chen
Jiachao Chen
Lilan Zhang
Haibo Peng

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Floods are among the most devastating natural disasters, necessitating efficient early-warning systems. Machine-learning surrogates are now widely adopted for this task, yet increasing data volume is often assumed to improve machine-learning performance. Using 17 flood events (2012–2024) from the data-scarce Pajiang River detention basin, we quantitatively test whether "more data" necessarily translates into better multi-step water-level forecasts. Three Long Short-Term Memory network(LSTM) input scenarios were designed: using only historical data from the target station (S1), using only upstream station data (S2), and combining both data sources (S3). Surprisingly, expanding the input matrix from S1 to S3, yielded no accuracy gain and even degraded skill beyond 4-h lead time (NSE decreased from 0.97 to 0.44 and peak bias increased from 0.25 to 1.88 m). The highest accuracy at 1–2 hours prediction horizons were achieved with the smallest input set (S1), whereas the most robust longer-lead forecasts (3–4 h) were produced with the moderate set (S2). Parsimonious inputs reduced over-fitting risk and maintained uncertainty within operational thresholds. Our findings caution against unchecked input inflation in data-limited basins and highlight the need for input-selection protocols prior to model deployment.

Version published to 10.21203/rs.3.rs-7597625/v1 on Research Square
Oct 7, 2025

Multi-Task Learning as a Step Toward Building General-Purpose Hydrological Forecasting Systems

This article has 2 authors:
1. Bekir Zahit Demiray
2. Ibrahim Demir
This article has no evaluationsLatest version Nov 2, 2025
BiLSTM-RiskNet: Domain-Guided Explainable Deep Learning for Multi-Class Climate Risk Prediction from Hourly Weather Data

This article has 6 authors:
1. Anzim Hasan Nabil
2. MD Mahmudul Alom Sifat
3. Sadia Akter Sarika
4. Taiyeba Tasnim
5. Arnab Nandi Eshan
6. Tauseef Ahmed Khan
This article has no evaluationsLatest version Oct 10, 2025
Investigating LSTM for DMA Hourly Demand Prediction: A novel Decomposition-Enhanced Approach

This article has 6 authors:
1. Kun Du
2. Yuansen Luo
3. Peiyu Guo
4. wei xu
5. Qiang Xu
6. Huangfeng Duan
This article has no evaluationsLatest version Oct 13, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Task Learning as a Step Toward Building General-Purpose Hydrological Forecasting Systems

BiLSTM-RiskNet: Domain-Guided Explainable Deep Learning for Multi-Class Climate Risk Prediction from Hourly Weather Data

Investigating LSTM for DMA Hourly Demand Prediction: A novel Decomposition-Enhanced Approach