Human-Machine Collaborative Enhanced Interpretable Distillation Model for High-Precision Online Defect Detection

Shuxuan Zhao
Yunqing Tang
Hongwei Xu
Lilan Liu
Wei Qin
Jie Zhang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Online vision-based defect detection is highly preferred in smart manufacturing for its ability to provide immediate feedback and enable timely correction. However, effective human-machine collaboration in practical deployment faces significant challenges: existing models often lack interpretability, hindering operators from understanding the rationale behind model decisions, effectively intervening in critical judgments, or optimizing the process, thus limiting the system's reliability and efficiency. Concurrently, online detection imposes stringent demands on real-time performance. To address these dual challenges, this research proposes a Human-Machine Collaborative Enhanced Interpretable Knowledge Distillation strategy. It aims to boost the real-time performance of detection models while guaranteeing high accuracy and, crucially, interpretability, thereby effectively supporting human-machine collaboration. Firstly, a CNN-Transformer hybrid network is designed, leveraging the strengths of self-attention for global receptive fields and convolution operations for local receptive fields, to robustly extract features of tiny and irregularly shaped defects. Secondly, an innovative explainable knowledge quantization method is devised to quantize defect and texture features into interpretable knowledge units, explicitly characterizing the model's capability in feature extraction and providing a transparent basis for human interaction. Finally, an explainable knowledge alignment loss function is proposed. It utilizes the superior defect feature extraction capability of the teacher model as a key learning objective for the student model, enabling the student to achieve more precise defect detection with a simpler network architecture. Experimental results demonstrate that the proposed CNN-Transformer hybrid network achieves over 95% accuracy and recall. Visualization experiments confirm that the method better focuses on defect features. More importantly, the explainable knowledge distillation strategy significantly outperforms other lightweight methods. It not only satisfies the stringent accuracy and real-time requirements of online defect detection but, critically, its inherent interpretability directly empowers human-machine collaboration. This allows operators to comprehend, trust, and effectively utilize the model's outputs, collaboratively enhancing the overall performance of the detection system.

Version published to 10.21203/rs.3.rs-7043856/v1 on Research Square
Jul 25, 2025

Hybrid Framework for Interpretable Deepfake Video Detection Using CapsNet and Transformer Encoders

This article has 5 authors:
1. Gargi Kadam
2. Sanika Tiwarekar
3. Yash Sonawane
4. Kailas Devadkar
5. Jignesh Sisodia
This article has no evaluationsLatest version Aug 21, 2025
BCGH-Net: A Hierarchical Neural Framework for Fake News Detection Using BERT and Attention Fusion

This article has 4 authors:
1. Hussein Al-kaabi
2. Fuqdan Al-ibrahimi
3. Ali kadhim jasim
4. Zainab S. Idan
This article has no evaluationsLatest version Jul 24, 2025
RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining

This article has 2 authors:
1. sai prabanjan kumar kalvapalli
2. MALA C
This article has no evaluationsLatest version Aug 27, 2025

Listed in

Abstract

Article activity feed

Related articles

Hybrid Framework for Interpretable Deepfake Video Detection Using CapsNet and Transformer Encoders

BCGH-Net: A Hierarchical Neural Framework for Fake News Detection Using BERT and Attention Fusion

RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining