Multi-Teacher Knowledge Distillation via Tucker-Guided Representation Alignment and Adaptive Feature Mapping

Majid Sepahvand
Maytham N. Meqdad
Fardin Abdali Mohammadi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The knowledge distillation fused with the feature maps of multiple advanced teachers is a promising technique for training a student. However, if there is a high variance between the spatial shapes of the feature maps from teachers and students, effective distillation becomes challenging and it is wrong to directly calculate their differences. In this paper, a novel framework for structured knowledge distillation based on adaptive feature alignment and Tucker decomposition is proposed. The proposed framework uses a new combination of Tucker decomposition and learnable convolutional regression to enable structured, multi-path feature distillation from multiple teachers. The high-level feature tensor of each teacher is decomposed into core semantic representations, which are adaptively projected to student layers through learnable regressors. By providing semantically rich representations to guide several layers of the student network, the approach facilitates multi-teacher supervision. Finally, an adaptive hybrid loss is proposed to guide the transfer of core tensor knowledge from the teacher to the student. Experimental results on CIFAR-100 and Tiny-ImageNet demonstrate that our approach consistently outperforms state-of-the-art distillation baselines. According to the experimental results, the proposed knowledge distillation model achieved an accuracy of 96.48% and 91.70% for the classification images in the CIFAR-100 and Tiny-ImageNet datasets, respectively.

Version published to 10.21203/rs.3.rs-7184025/v1 on Research Square
Jul 25, 2025

Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment

This article has 6 authors:
1. Xiandong Meng
2. Yan Wu
3. Yexin Tian
4. Xin Hu
5. Tianze Kang
6. Junliang Du
This article has no evaluationsLatest version Jul 23, 2025
Multi-view Unsupervised Feature Selection Guided by Latent Representation and Tensor Learning

This article has 3 authors:
1. Jianjun Jiang
2. Xijiong Xie
3. Guoqing Chao
This article has no evaluationsLatest version Aug 20, 2025
Two-Stage Fine-tuning CLIP by Introducing Structure Knowledge for Few-shot Classification

This article has 4 authors:
1. Zhe Zhang
2. Xiang-Gui Guo
3. Junbao Zhuo
4. Huimin Ma
This article has no evaluationsLatest version Jul 30, 2025

Listed in

Abstract

Article activity feed

Related articles

Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment

Multi-view Unsupervised Feature Selection Guided by Latent Representation and Tensor Learning

Two-Stage Fine-tuning CLIP by Introducing Structure Knowledge for Few-shot Classification