Strongly Connected Components Are All You Need: Graph-Theoretic Interpretability and Optimization for Vision Transformers

Devansh Garg

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Vision Transformers have revolutionized computer vision with unprecedented performance, yet their attention mechanisms remain largely opaque, hindering both trust and optimization in critical applications [1, 2]. Despite significant advances in transformer interpretability, existing approaches provide limited structural insights into how attention patterns encode visual information, creating a fundamental barrier to understanding these powerful models [3, 4]. We address this challenge by introducing a novel, comprehensive graph-theoretic framework for analyzing Vision Transformer attention mechanisms. Our approach treats attention matrices as directed graphs and applies Tarjan’s Strongly Connected Components algorithm to reveal hidden structural patterns within attention flows [5]. Through systematic evaluation of 22 pre-trained models across ViT, DeiT, and CLIP architectures, we discover that Strongly Connected Components with meaningful connectivity directly correspond to semantically coherent visual features—providing a mathematical foundation for understanding attention-based feature learning [6]. This insight enables a novel optimization strategy: layers containing minimal meaningful connectivity can be selectively ablated through attention linearization, achieving 16-30% inference speedup with only 0-7% accuracy loss for large models. Our findings demonstrate that ViT models exhibit superior structural organization (92.6% effectiveness) compared to DeiT (51.1%) and CLIP (57.2%) variants, while patch32 configurations achieve optimal performance tradeoffs [7, 8]. Beyond interpretability, this work establishes graph theory as a powerful lens for transformer analysis and provides practical tools for deploying efficient vision models without sacrificing semantic understanding [9].

Version published to 10.21203/rs.3.rs-7877883/v1 on Research Square
Oct 20, 2025

Learning-Compatible Sparse Hypergraph Partitioning for Scalable Structured Prediction

This article has 5 authors:
1. Menghao Li
2. Yuhan Chen
3. Zixuan Wang
4. Liyun Xu
5. Zhou Shen
This article has no evaluationsLatest version Nov 5, 2025
Towards Structure-captive Differentiable Graph Pooling via PageRank

This article has 5 authors:
1. Shu Huang
2. Weiguang Wang
3. Zhiyong Li
4. Xiaofeng Wang
5. Xiaofeng Zou
This article has no evaluationsLatest version Oct 27, 2025
FLIRL-Net: Fuzzy Logic-Driven Important Relationship Learning for Scene Graph Generation

This article has 4 authors:
1. Jin Wang
2. Zilong Yang
3. Jialing Xu
4. Sixu Wu
This article has no evaluationsLatest version Oct 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Learning-Compatible Sparse Hypergraph Partitioning for Scalable Structured Prediction

Towards Structure-captive Differentiable Graph Pooling via PageRank

FLIRL-Net: Fuzzy Logic-Driven Important Relationship Learning for Scene Graph Generation