Quantifying Explainability in Healthcare AI with the Extended Collaborative Intelligence Index (X-CII): A Synthetic Evaluation Framework
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Human-AI collaboration in healthcare motivates explainable AI (XAI) to promote trust, safety, and regulatory alignment for high-risk systems under the EU AI Act [1] and IMDRF GMLP guidance [2]. We propose the Extended Collaborative Intelligence Index (X-CII), which integrates team quality (Q), effectiveness (E), and safety (S) through a risk-sensitive power mean (λ = 0.25). To link explainability directly to risk mitigation and address critiques of post-hoc XAI [3], our synthetic evaluation applies a conservative +5% multiplicative uplift in team detectability (d'), reflecting reported 5–10% task-performance gains with XAI [16,17]. Under the equal-variance binormal model this increases AUC from 0.800 to approximately 0.813. The uplift modifies only S while keeping Q and E fixed. Unless otherwise stated, relative percentages are referenced to the better individual agent (human or AI). Using 10,000 paired Monte Carlo draws with independent skills (ρ = 0), the XAI-enhanced team achieved a median relative X-CII of 102.963% (IQR 101.24–104.56%) versus the better individual, outperforming it in 89.7% of cases. Versus an identical team without XAI, median X-CII rose by 0.811% (IQR 0.593–1.003%), with a 100% win rate, isolating explainability's incremental contribution. Under domain shift (AUC = 0.72 with adjusted fidelity/reliance parameters), the median remained 102.82%. Lower integration efficiency (η ≤ 0.8) reduced team performance below baseline, whereas negative skill correlation (ρ = -0.5), indicating complementary strengths, increased gains (median 108.66%). Here ρ denotes human–AI skill correlation and η parameterizes integration efficiency (η = 1 ideal). The X-CII framework can help quantify how explainability contributes to safe and effective human–AI teamwork and to benchmark compliance-oriented design. Safety normalization (S = 1 - L / L_worst) ensures bounded, comparable scores, though it compresses high-performance differences. This work provides no legal advice; consult the official EU AI Act and competent authorities for regulatory interpretation.