Distribution-Informed Machine Learning for Flash Flood Susceptibility: Integrating Weibull Extreme Value Theory with Interpretable Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Flash floods represent one of the deadliest weather-related hazards globally, yet their prediction remains fundamentally challenged by extreme class imbalance in observational data. This study addresses a critical methodological crisis: traditional evaluation metrics—both overall accuracy and Area Under the ROC Curve (AUC)—are profoundly misleading for rare event prediction. We demonstrate empirically how models achieving 94\% accuracy and AUC exceeding 0.95 can simultaneously fail to detect 40\% of flood events. Moving beyond conventional approaches, we introduce a paradigm shift from ad hoc feature engineering to distribution theory-informed feature generation. By integrating Extreme Value Theory through Weibull distribution analysis, we derive 24 features from rigorous statistical characterization of precipitation extremes rather than heuristic transformations. Evaluating seven model configurations for flash flood susceptibility in Nova Scotia, Canada, using Environment and Climate Change Canada operational warning thresholds we find that an Artificial Neural Network with selected features achieves 90\% recall and 90.6\% balanced accuracy. SHAP analysis reveals that the intensity-duration product—a distribution-informed physical process feature—dominates predictions with mean $|$SHAP$|$ = 0.127, validating both hydrological understanding and the distribution-informed framework. These findings provide essential guidance for practitioners: comprehensive reporting of balanced accuracy, precision, and recall is mandatory for imbalanced datasets where traditional metrics mask operational failure.