Efficient Layer-wise Attribution Method for ScalableExplainability in VLMs

Sivarama Prasad Tera
Ravikumar Chinthaginjala
Priya Natha
Fadi Al-Turjman
Manel Ayadi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large vision-language models (VLMs) with billions of parameters are increasingly deployed in high-stakes applications suchas medical diagnosis, autonomous driving, and content moderation, yet their decision-making processes remain opaque.Existing explainable AI (XAI) methods face severe computational bottlenecks when applied to these large-scale models, withexplanation generation times exceeding 45 seconds per sample on standard hardware, limiting their practical utility in real-worldscenarios. This study addresses the critical gap by proposing Efficient Layer-wise Attribution Method (ELAM), a novel scalableXAI approach that leverages layer-wise gradient approximation and selective attention mechanism analysis to generate faithfulexplanations with significantly reduced computational overhead. We evaluate ELAM on three state-of-the-art VLMs spanningthree orders of magnitude in model size: CLIP-ViT-L/14 (428M parameters), BLIP-2 (2.7B parameters), and LLaVA-1.5-7B (7Bparameters) across 2,500 image-text pairs from MS-COCO and Flickr30k datasets. Experimental results demonstrate thatELAM achieves 87.3% computational efficiency improvement (7.8× to 11.9× speedup) over gradient-based baselines whilemaintaining 94.2% explanation fidelity as measured by insertion-deletion scores. Furthermore, ELAM successfully scales tomodels with up to 7 billion parameters, reducing explanation generation time from 45.2 seconds to 3.8 seconds per sampleon standard hardware. Our method provides a practical solution for deploying transparent and accountable VLMs in criticaldomains where both accuracy and interpretability are essential.

Version published to 10.21203/rs.3.rs-8736345/v1 on Research Square
Feb 10, 2026

EPMORE: Explainable Process Mixture-of-Experts

This article has 7 authors:
1. Wei Sheng
2. Chengzhu Xiao
3. Lunhao Ao
4. Junyan Long
5. Ye Yu
6. Yangguang Jia
7. Qihua Zhang
This article has no evaluationsLatest version Mar 24, 2026
reEtym: A Natively Feature-Disentangled Transformer for Interpretability

This article has 1 author:
1. Hongyu Shi
This article has no evaluationsLatest version Apr 15, 2026
Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning

This article has 2 authors:
1. Tianrui Zhao
2. Linyu Wu
This article has no evaluationsLatest version Mar 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

EPMORE: Explainable Process Mixture-of-Experts

reEtym: A Natively Feature-Disentangled Transformer for Interpretability

Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning