TY - JOUR
T1 - CDFKD-MFS
T2 - Collaborative Data-Free Knowledge Distillation via Multi-Level Feature Sharing
AU - Hao, Zhiwei
AU - Luo, Yong
AU - Wang, Zhi
AU - Hu, Han
AU - An, Jianping
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Recently, the compression and deployment of powerful deep neural networks (DNNs) on resource-limited edge devices to provide intelligent services have become attractive tasks. Although knowledge distillation (KD) is a feasible solution for compression, its requirement on the original dataset raises privacy concerns. In addition, it is common to integrate multiple pretrained models to achieve satisfactory performance. How to compress multiple models into a tiny model is challenging, especially when the original data are unavailable. To tackle this challenge, we propose a framework termed collaborative data-free knowledge distillation via multi-level feature sharing (CDFKD-MFS), which consists of a multi-header student module, an asymmetric adversarial data-free KD module, and an attention-based aggregation module. In this framework, the student model equipped with a multi-level feature-sharing structure learns from multiple teacher models and is trained together with a generator in an asymmetric adversarial manner. When some real samples are available, the attention module adaptively aggregates predictions of the student headers, which can further improve performance. We conduct extensive experiments on three popular computer visual datasets. In particular, compared with the most competitive alternative, the accuracy of the proposed framework is 1.18% higher on the CIFAR-100 dataset, 1.67% higher on the Caltech-101 dataset, and 2.99% higher on the mini-ImageNet dataset.
AB - Recently, the compression and deployment of powerful deep neural networks (DNNs) on resource-limited edge devices to provide intelligent services have become attractive tasks. Although knowledge distillation (KD) is a feasible solution for compression, its requirement on the original dataset raises privacy concerns. In addition, it is common to integrate multiple pretrained models to achieve satisfactory performance. How to compress multiple models into a tiny model is challenging, especially when the original data are unavailable. To tackle this challenge, we propose a framework termed collaborative data-free knowledge distillation via multi-level feature sharing (CDFKD-MFS), which consists of a multi-header student module, an asymmetric adversarial data-free KD module, and an attention-based aggregation module. In this framework, the student model equipped with a multi-level feature-sharing structure learns from multiple teacher models and is trained together with a generator in an asymmetric adversarial manner. When some real samples are available, the attention module adaptively aggregates predictions of the student headers, which can further improve performance. We conduct extensive experiments on three popular computer visual datasets. In particular, compared with the most competitive alternative, the accuracy of the proposed framework is 1.18% higher on the CIFAR-100 dataset, 1.67% higher on the Caltech-101 dataset, and 2.99% higher on the mini-ImageNet dataset.
KW - Attention
KW - Data-free Distillation
KW - Knowledge Distillation
KW - Model Compression
KW - Multi-teacher Distillation
UR - http://www.scopus.com/inward/record.url?scp=85135213314&partnerID=8YFLogxK
U2 - 10.1109/TMM.2022.3192663
DO - 10.1109/TMM.2022.3192663
M3 - Article
AN - SCOPUS:85135213314
SN - 1520-9210
VL - 24
SP - 4262
EP - 4274
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -