Quantifying Explainability in Healthcare AI with the Extended Collaborative Intelligence Index (X-CII): A Synthetic Evaluation Framework

Unya Torisan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Human-AI collaboration in healthcare motivates explainable AI (XAI) to promote trust, safety, and regulatory alignment for high-risk systems under the EU AI Act [1] and IMDRF GMLP guidance [2]. We propose the Extended Collaborative Intelligence Index (X-CII), which integrates team quality (Q), effectiveness (E), and safety (S) through a risk-sensitive power mean (λ = 0.25). To link explainability directly to risk mitigation and address critiques of post-hoc XAI [3], our synthetic evaluation applies a conservative +5% multiplicative uplift in team detectability (d'), reflecting reported 5–10% task-performance gains with XAI [16,17]. Under the equal-variance binormal model this increases AUC from 0.800 to approximately 0.813. The uplift modifies only S while keeping Q and E fixed. Unless otherwise stated, relative percentages are referenced to the better individual agent (human or AI). Using 10,000 paired Monte Carlo draws with independent skills (ρ = 0), the XAI-enhanced team achieved a median relative X-CII of 102.963% (IQR 101.24–104.56%) versus the better individual, outperforming it in 89.7% of cases. Versus an identical team without XAI, median X-CII rose by 0.811% (IQR 0.593–1.003%), with a 100% win rate, isolating explainability's incremental contribution. Under domain shift (AUC = 0.72 with adjusted fidelity/reliance parameters), the median remained 102.82%. Lower integration efficiency (η ≤ 0.8) reduced team performance below baseline, whereas negative skill correlation (ρ = -0.5), indicating complementary strengths, increased gains (median 108.66%). Here ρ denotes human–AI skill correlation and η parameterizes integration efficiency (η = 1 ideal). The X-CII framework can help quantify how explainability contributes to safe and effective human–AI teamwork and to benchmark compliance-oriented design. Safety normalization (S = 1 - L / L_worst) ensures bounded, comparable scores, though it compresses high-performance differences. This work provides no legal advice; consult the official EU AI Act and competent authorities for regulatory interpretation.

Version published to 10.31224/5652
Oct 22, 2025

Explainable AI for Maternal Health Risk Prediction in Bangladesh: A Hybrid Fuzzy-XGBoost Framework with Clinician Validation

This article has 3 authors:
1. Farjana Yesmin
2. Nusrat Shirmin
3. Suraiya Shabnam Bristy
This article has no evaluationsLatest version Jan 14, 2026
Comparative Evaluation of SHAP and LIME for Clinical Interpretability in Postoperative Cardiac Surgery Mortality Prediction Models

This article has 4 authors:
1. Telmo Miguel-Medina
2. Susel Góngora Alonso
3. Isabel de la Torre Díez
4. Mª Lourdes del Río Solá
This article has no evaluationsLatest version Dec 10, 2025
How to Evaluate Medical AI

This article has 8 authors:
1. Ilia Kopanichuk
2. Petr Anokhin
3. Vladimir Shaposhnikov
4. Vladimir Makharev
5. Ekaterina Tsapieva
6. Iaroslav Bespalov
7. Dmitry Dylov
8. Ivan Oseledets
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Explainable AI for Maternal Health Risk Prediction in Bangladesh: A Hybrid Fuzzy-XGBoost Framework with Clinician Validation

Comparative Evaluation of SHAP and LIME for Clinical Interpretability in Postoperative Cardiac Surgery Mortality Prediction Models

How to Evaluate Medical AI