Geospatial Machine Learning for Predicting Flash Flood Response at Ungauged Appalachian Watersheds: Terrain, Soil, and Land Cover Controls

Sujan Bhattarai

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Flash floods remain among the deadliest weather hazards in the United States, yet the majority of flood-prone watersheds in the Appalachian region lack streamflow monitoring. Predicting flood response characteristics at these ungauged sites requires understanding which landscape properties control hydrologic behavior. This study evaluates whether geospatial basin descriptors derived from high-resolution terrain, soil, and land cover datasets can predict seven flood response metrics across 49 gauged Appalachian watersheds spanning seven states (Virginia, West Virginia, North Carolina, Tennessee, Kentucky, Georgia, and Pennsylvania). Predictor variables were extracted from the USGS 3D Elevation Program (10 m), the National Land Cover Database (30 m), and the NRCS Soil Data Access service. Four model families were compared using leave-one-out spatial cross-validation: regularized linear models (Ridge, ElasticNet), tree-based models (Random Forest, XGBoost), and Gaussian Process Regression (GPR) with multiple kernel configurations. Results show that GPR with a Matern 1.5 kernel achieves the highest predictive skill for the Q95 discharge ratio (R-squared = 0.46) and mean rise rate (R-squared = 0.73), while regularized linear models perform comparably or better for other targets. Flashiness index and coefficient of variation of annual peaks are not predictable from static geospatial descriptors (R-squared approximately equal to 0), indicating that these properties depend on storm characteristics rather than landscape attributes. Spearman correlation analysis identifies basin relief (rho = -0.58, p < 0.001) and drainage area (rho = -0.42, p < 0.01) as the strongest correlates of flood response. SHAP-based feature importance analysis confirms that terrain properties dominate across most targets, contributing 42 to 69 percent of total importance. GPR prediction intervals show well-calibrated uncertainty, with observed 95 percent coverage ranging from 88 to 95 percent across targets. These findings suggest that geospatial machine learning can provide moderate predictive skill for flood magnitude indicators at ungauged Appalachian sites, but flashiness metrics require dynamic storm-event information that static basin descriptors cannot capture.

Version published to 10.31223/x5zb5f
Apr 14, 2026

Event Aware Flood Mapping for Agricultural Landscapes: A Robustness Oriented Comparison of Deep Learning and Machine Learning in the Arkansas 2025 Flood Event

This article has 3 authors:
1. Manh-Dung Vu
2. Gia-Hien Tran
3. Ming-Che Hu
This article has no evaluationsLatest version Apr 14, 2026
Rainfall–Road Synergy and Landslide Risk Mapping in the Nepal Himalayas: A GIS–MCDA Framework with Level-4 Citizen Science Validation

This article has 4 authors:
1. Narayan Thapa
2. Sushant Sharma
3. Reshma Shrestha
4. Mukesh Thapa
This article has no evaluationsLatest version Apr 17, 2026
A Transferable Machine Learning Approach for Identifying Rainfall-Induced Cliff-Type (Shallow) Landslides in Seismic and Non-Seismic Regions

This article has 3 authors:
1. Sushama De Silva
2. Taro Uchimura
3. Pang-jo Chun
This article has no evaluationsLatest version Jun 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Event Aware Flood Mapping for Agricultural Landscapes: A Robustness Oriented Comparison of Deep Learning and Machine Learning in the Arkansas 2025 Flood Event

Rainfall–Road Synergy and Landslide Risk Mapping in the Nepal Himalayas: A GIS–MCDA Framework with Level-4 Citizen Science Validation

A Transferable Machine Learning Approach for Identifying Rainfall-Induced Cliff-Type (Shallow) Landslides in Seismic and Non-Seismic Regions