Per-Sample Invariant Tracking on MNIST with Certainty-Validity Diagnostics

Datorien L. Anderson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Standard evaluation metrics conflate different kinds of error. A confident incorrect predic tion and an uncertain incorrect prediction are both counted identically, even though they reflect different epistemic states. This paper uses the Certainty-Validity (CVS) framework to track how those states migrate during learning. The Minimal Operative Unit (MOU) ring serves as a synthetic diagnostic environment in which quadrant migration, lock thresholds, and interven tion timing can be calibrated under controlled conditions before the same procedure is applied per-sample on MNIST. On the ring, baseline training drives CVS from 0.79 to 0.07 while accu mulating confident-incorrect states, whereas CVS-regulated and CVS-gated regimes stabilize CVS in the 0.62–0.75 range and reduce confident-incorrect predictions by 11×. On MNIST, freeze-based CVS interventions preserve aggregate CVS but also preserve still-learnable uncer tainty, showing that aggregate control is too coarse on its own. Per-sample tracking localizes the residual instability to a small tail rather than broad model degradation: at confidence threshold θ =0.7, MNIST exhibits no persistent uncertainty population, but it does exhibit a small per sistent confident-error tail, a boundary-adjacent volatile set, and a small number of drift cases. In this setting, the main value of CVS is diagnostic: it identifies where commitment is stable, where it is forced, and where uncertainty remains unresolved during training.

Version published to 10.21203/rs.3.rs-9075794/v1 on Research Square
Mar 11, 2026

The Reliability Chasm: AI Accuracy Across Three Reasoning Domains

This article has 3 authors:
1. Kuldeep pandit
2. Vtasla Pandit
3. Aayan Pandit
This article has no evaluationsLatest version Mar 16, 2026
Bidirectional Dissociation Between Self-Report and Behavior in AI Status Sensitivity

This article has 1 author:
1. Dustin James
This article has no evaluationsLatest version Mar 26, 2026
Calibration Inversion and Data-Freshness Govern AI Reliability in Indeterminate Domains: Evidence from 2,730 Data Points/ 142 Cross-Protocol Sessions

This article has 1 author:
1. Kuldeep Pandit
This article has no evaluationsLatest version Mar 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Reliability Chasm: AI Accuracy Across Three Reasoning Domains

Bidirectional Dissociation Between Self-Report and Behavior in AI Status Sensitivity

Calibration Inversion and Data-Freshness Govern AI Reliability in Indeterminate Domains: Evidence from 2,730 Data Points/ 142 Cross-Protocol Sessions