Proof-of-Exploit: Cryptographically Verified LLM Cybersecurity Evaluation via Tiered Risk Metrics in the Operational-Risk Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Existing Large Language Model cybersecurity evaluations rely on text-based plausibility scoring systems that fail to validate operational exploit viability. In this paper we present the Operational Risk Framework (ORF), advancing beyond our prior MalcodeEval work through three (3) innovations: 1.) ECDSA-P384 cryptographic execution validation providing non-repudiable proof-of-exploit, 2.) MITRE ATT&CK-aligned tiered scoring with CVSS v4.0-derived severity weights, 3.) and six-phase progressive validation tracking 217 Indicators of Compromise within isolated VM environments.The utility of this framework is demonstrated through detailed case studies that have revealed granular disparities in capabilities and multi-stage attack progression, often obscured by standard pass/fail binary metrics. This work contributes systematic LLM-to-CVSS mapping and open cryptographic protocols toward NIST AI RMF 2.0 development.

Article activity feed