A General-Purpose Knowledge Retention Metric for Evaluating Distillation Models Across Architectures and Tasks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Knowledge distillation (KD) compresses deep neural networks by transferring knowledge from a high-capacity teacher model to a lightweight student model. However, conventional evaluation metrics such as accuracy, mAP, IoU, or RMSE focus mainly on task performance and overlook how effectively the student internalizes the teacher’s knowledge. Methods: This study introduces the Knowledge Retention Score (KRS), a composite metric that integrates intermediate feature similarity and output agreement into a single interpretable score to quantify knowledge retention. KRS was primarily validated in computer vision (CV) through 36 experiments covering image classification, object detection, and semantic segmentation using diverse datasets and eight representative KD methods. Supplementary experiments were conducted in natural language processing (NLP) using transformer-based models on SST-2, and in time series regression with convolutional teacher–student pairs. Results: Across all domains, KRS correlated strongly with standard performance metrics while revealing internal retention dynamics that conventional evaluations often overlook. By reporting feature similarity and output agreement separately alongside the composite score, KRS provides transparent and interpretable insights into knowledge transfer. Conclusions: KRS offers a stable diagnostic tool and a complementary evaluation metric for KD research. Its generality across domains demonstrates its potential as a standardized framework for assessing knowledge retention beyond task-specific performance measures.