Multi-Agent based Dynamic Anchors for Interpretation of Deep Learning Classifiers
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Explainable Artificial Intelligence (XAI) provides insights into how black-box models make decisions. Among existing approaches, anchors provide high-precision, human-interpretable rules in the form of simple if-then conditions over input features. Classical anchors compute discrete instance-wise rules using a bandit-guided beam search without learning across instances or coordinating rules across classes. Consequently, they are fundamentally local and do not yield a coherent picture of the model's decision regions.We propose Reinforcement Learning Dynamic Anchors (RLDA) , a reinforcement learning (RL) formulation of anchor discovery, in which a policy learns to refine an axis-aligned box around an instance through a sequence of continuous actions, directly optimizing interpretable quantities such as precision and coverage. We then extend this framework to Multi-Agent Dynamic Anchors (MADA) , a cooperative game with one or more agents per class, where agents jointly learn class-wise anchor regions under shared rewards that encourage both local fidelity and a global structure, operating under defined equilibrium conditions.The trained policies were applied to data samples to generate both instance- and class-level rules, which were then tested globally across all classes. Experiments on standard tabular datasets showed that, first, RLDA provides more precise rules and the performance is comparable to classical anchors while producing reusable policies; and second, MADA yields class-wise rules with high precision, useful coverage, and reduced cross-class overlap, thereby providing a more global and structured explanation of the classifier.