Calibration Inversion and Data-Freshness Govern AI Reliability in Indeterminate Domains: Evidence from 2,730 Data Points/ 142 Cross-Protocol Sessions

Kuldeep Pandit

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

AI systems in fully indeterminate domains — where ground truth is probabilistic and post-hoc — present distinct reliability challenges. This third study in a three-study programme analyses 2,730 data points across 142 sessions, six frontier models, and three protocols spanning financial markets, meteorology, sports, and cryptocurrency. The central finding is calibration inversion: Gemini achieves the best point-estimate accuracy (5.3% mean error, rank 1) yet the second-worst confidence-interval calibration (29.7%, rank 5; −60 pp below the 90% target) — a dissociation absent from all prior domains. A second finding is data-freshness failure: DeepSeek S&P 500 predictions are systematically stale (12–13% error) and Perplexity predictions freeze across sessions — failures of information-access architecture. Cross-protocol Spearman rank correlation (ρ = 0.695) confirms moderate ranking consistency. Claude alone achieves top-tier performance across all protocols (composite 93.6%; V1: 96.7%, V3: 100%, V4-CI: 84.2%). Four indeterminate-domain error types are identified; Type IV (calibration inversion) demands a calibration-verification layer in GAAS.

Version published to 10.21203/rs.3.rs-9059499/v1 on Research Square
Mar 16, 2026

The Reliability Chasm: AI Accuracy Across Three Reasoning Domains

This article has 3 authors:
1. Kuldeep pandit
2. Vtasla Pandit
3. Aayan Pandit
This article has no evaluationsLatest version Mar 16, 2026
Beyond the Scaling Ceiling: An Architectural Phase Transition in AI Reliability

This article has 1 author:
1. Kuldeep Pandit
This article has no evaluationsLatest version Mar 16, 2026
Per-Sample Invariant Tracking on MNIST with Certainty-Validity Diagnostics

This article has 1 author:
1. Datorien L. Anderson
This article has no evaluationsLatest version Mar 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Reliability Chasm: AI Accuracy Across Three Reasoning Domains

Beyond the Scaling Ceiling: An Architectural Phase Transition in AI Reliability

Per-Sample Invariant Tracking on MNIST with Certainty-Validity Diagnostics