Multi-Teacher Knowledge Distillation via Tucker-Guided Representation Alignment and Adaptive Feature Mapping

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The knowledge distillation fused with the feature maps of multiple advanced teachers is a promising technique for training a student. However, if there is a high variance between the spatial shapes of the feature maps from teachers and students, effective distillation becomes challenging and it is wrong to directly calculate their differences. In this paper, a novel framework for structured knowledge distillation based on adaptive feature alignment and Tucker decomposition is proposed. The proposed framework uses a new combination of Tucker decomposition and learnable convolutional regression to enable structured, multi-path feature distillation from multiple teachers. The high-level feature tensor of each teacher is decomposed into core semantic representations, which are adaptively projected to student layers through learnable regressors. By providing semantically rich representations to guide several layers of the student network, the approach facilitates multi-teacher supervision. Finally, an adaptive hybrid loss is proposed to guide the transfer of core tensor knowledge from the teacher to the student. Experimental results on CIFAR-100 and Tiny-ImageNet demonstrate that our approach consistently outperforms state-of-the-art distillation baselines. According to the experimental results, the proposed knowledge distillation model achieved an accuracy of 96.48% and 91.70% for the classification images in the CIFAR-100 and Tiny-ImageNet datasets, respectively.

Article activity feed