The Contextual Imperative: A Utility-Based Framework for Forensic LLM Risk Assessment

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: As Large Language Models (LLMs) transition from generative assistants to decision-support tools in forensic risk assessment, traditional performance metrics such as AUC-ROC and Accuracy are proving insufficient. These metrics fail to account for the asymmetric "costs" of forensic errors (e.g., unnecessary detention vs. public safety risks) and the shifting baseline of risk across different legal settings. Methods: We propose the Contextual Imperative, a framework asserting that forensic AI utility is a function of outcome prevalence and the normative "Cost" ratio of the decision-making environment. Adhering to the newly released TRIPOD-LLM reporting guidelines (Gallifant et al., 2025), we employ Decision Curve Analysis (DCA) to define the "Moral Equilibrium"—the point where the harm of unnecessary detention equals the benefit of violence prevention. Results: Our sensitivity analysis reveals a "Prevalence Trap": in low-prevalence settings (5%), even high-performing models yield a Positive Predictive Value (PPV) of less than 30%, rendering them ethically unsuitable for custodial decisions. We further demonstrate that model utility is non-linear and confined to a probability threshold ($p_t$) range of 0.05 to 0.45. Conclusions: Forensic AI must be regulated by its utility, not just its accuracy. We argue for a mandatory "Right to Reason," validated by parallel developments in physical AI (NVIDIA Alpamayo), which requires models to generate traceable, verifiable rationales before suggesting a deprivation of liberty.

Article activity feed