Stochastic Hardness in Model Risk: A Framework for Tangible and Auditable Assessments in Banking
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Banking executives need quantitative, auditable tools to assess model reliability that satisfy regulatory requirements while reducing operational overhead. This paper introduces a statistical stress-testing framework for model reliability assessment that provides explicit pass/fail thresholds for regulatory compliance. Our approach transforms subjective model validation into objective, data-driven certification through tail-exponent analysis and summability testing. The framework aims to deliver a polynomial-tail threshold at $p = 1$ as a concrete decision criterion, automated monitoring systems that integrate with existing infrastructure, and comprehensive audit trails that strengthen regulatory confidence. The methodology is designed to align with OCC 2011-12, SR 11-7, and Basel III requirements as a design objective while potentially providing superior risk detection compared to traditional backtesting approaches. We present a theoretical framework with operationalization pathways, without claiming realized financial impacts or supervisory approvals. Our choice of tail-exponent analysis is motivated by its direct link to computational complexity: when error rates decay faster than $1/n$, reliability scales polynomially with input size; slower decay indicates potential NP-hard explosion of unreliability. This makes the tail index a natural diagnostic for distinguishing tractable from intractable model behavior. We motivate this approach via the WC--SP dimension~\cite{DyerStougie}: (SP = distributional Stochastic Polynomial-time; “WC” here is a shorthand for worst-case, NP-hard–like regimes) worst‑case intractability (WC-like regimes) versus distributional polynomial‑time behavior (SP) on operational inputs. The critical issue is whether reliability scales tractably on typical data; tail‑exponent monitoring offers an auditable proxy for staying in the SP-like regime without invoking worst‑case assumptions.