Geospatial Machine Learning for Predicting Flash Flood Response at Ungauged Appalachian Watersheds: Terrain, Soil, and Land Cover Controls
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Flash floods remain among the deadliest weather hazards in the United States, yet the majority of flood-prone watersheds in the Appalachian region lack streamflow monitoring. Predicting flood response characteristics at these ungauged sites requires understanding which landscape properties control hydrologic behavior. This study evaluates whether geospatial basin descriptors derived from high-resolution terrain, soil, and land cover datasets can predict seven flood response metrics across 49 gauged Appalachian watersheds spanning seven states (Virginia, West Virginia, North Carolina, Tennessee, Kentucky, Georgia, and Pennsylvania). Predictor variables were extracted from the USGS 3D Elevation Program (10 m), the National Land Cover Database (30 m), and the NRCS Soil Data Access service. Four model families were compared using leave-one-out spatial cross-validation: regularized linear models (Ridge, ElasticNet), tree-based models (Random Forest, XGBoost), and Gaussian Process Regression (GPR) with multiple kernel configurations. Results show that GPR with a Matern 1.5 kernel achieves the highest predictive skill for the Q95 discharge ratio (R-squared = 0.46) and mean rise rate (R-squared = 0.73), while regularized linear models perform comparably or better for other targets. Flashiness index and coefficient of variation of annual peaks are not predictable from static geospatial descriptors (R-squared approximately equal to 0), indicating that these properties depend on storm characteristics rather than landscape attributes. Spearman correlation analysis identifies basin relief (rho = -0.58, p < 0.001) and drainage area (rho = -0.42, p < 0.01) as the strongest correlates of flood response. SHAP-based feature importance analysis confirms that terrain properties dominate across most targets, contributing 42 to 69 percent of total importance. GPR prediction intervals show well-calibrated uncertainty, with observed 95 percent coverage ranging from 88 to 95 percent across targets. These findings suggest that geospatial machine learning can provide moderate predictive skill for flood magnitude indicators at ungauged Appalachian sites, but flashiness metrics require dynamic storm-event information that static basin descriptors cannot capture.