The Evidence Ladder: Make AI Prove Itself Before It Judges Us

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial intelligence systems increasingly score, sort, and advise people in welfare, policing, education, and employment. Many of these systems are trusted on the basis of thin evidence such as benchmark scores, internal tests on historical data, or polished demonstrations rather than robust evaluation in the real world. This paper argues that such deployments invert the burden of proof, because people must show that they were harmed while vendors rarely have to show that their tools work fairly and reliably for the communities they affect. Drawing on documented cases in child welfare, online proctoring, and facial recognition, I propose a simple evidence ladder for AI that ranges from basic lab tests to independent audits, monitored field trials, and formal certification with ongoing review. The novelty is a cross domain, five level scaffold that any team can use to state its current proof and to plan concrete steps toward stronger evidence. I link these levels to familiar engineering practices and to current policy frameworks including the OECD AI Principles, the NIST AI Risk Management Framework, and the European Union AI Act. The central claim is that demanding evidence scaled to the stakes of the decision is a basic form of respect for the people whose lives AI systems judge.

Article activity feed