Interpreting BERT Using LIME and SHAP
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transformer-based language models such as BERT have achieved state-of-the-art performance on diverse natural language processing tasks, yet their decision processes remain opaque. This paper presents a comprehensive framework for interpreting BERT’s predictions in multi-label text classification using two leading model-agnostic explainability techniques—Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP). An end-to-end pipeline for fine-tuning BERT and producing token-level attributions is introduced. We systematically compare the explainers with respect to local fidelity, global consistency, stability and computational cost. Experimental results suggest that LIME generates intuitive, case-specific explanations while SHAP provides theoretically grounded and globally consistent attributions. By integrating the complementary strengths of both methods, we propose a hybrid interpretation strategy that balances interpretability, scalability and accuracy. The methodology is illustrated through a case study on multi-label genre classification from movie plot summaries. Detailed guidelines and synthetic visualisations are provided to enable practitioners to apply these techniques effectively and responsibly.