A Universal Knowledge Retention Metric for Evaluating Knowledge Distillation Models Across Architectures and Datasets

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper introduces the Knowledge Retention Score (KRS) as a novel performance metric to evaluate the effectiveness of Knowledge Distillation (KD) across diverse tasks including image classification, object detection, and image segmentation. Unlike conventional metrics that solely rely on accuracy, mean Average Precision (mAP), or Intersection over Union (IoU), KRS captures both feature similarity and output agreement between the teacher and student networks, offering a more nuanced measure of knowledge transfer. A total of 36 experiments were conducted using various KD methods—Vanilla KD, SKD, FitNet, ART, UET, GKD, GLD, and CRCD—across multiple datasets and architectures. The results showed that KRS is strongly correlated with conventional metrics, validating its reliability. Moreover, ablation studies confirmed KRS's sensitivity in reflecting the knowledge gained post-distillation. The paper also ranked KD methods by performance gain, revealing that techniques like UET and GLD consistently outperform others. Lastly, the architectural generalization analysis demonstrated KRS's robustness across different teacher–student pairs. These findings establish KRS as a comprehensive and interpretable metric, capable of guiding KD method selection and providing deeper insight into knowledge transfer beyond standard performance measures.

Article activity feed