TY - JOUR
T1 - A MODEL COMPRESSION METHOD BASED ON MULTI-TEACHER DISTILLATION FOR SAR SCENE CLASSIFICATION
AU - Li, Qihai
AU - Zhang, Jiawei
AU - Zhou, Tian
AU - Qi, Baogui
AU - Chen, He
AU - Chen, Liang
N1 - Publisher Copyright:
© The Institution of Engineering & Technology 2023.
PY - 2023
Y1 - 2023
N2 - In recent years, with the advancement of deep learning, convolutional neural networks tailored for processing image content have gained increasing attention within the remote sensing imagery domain. However, deep convolutional neural networks that exhibit exceptional performance often encompass a substantial number of parameters, thereby necessitating heightened computational resources and time for network training and prediction. Given the constraints posed by resource limitations in the realm of remote sensing, deploying such high-performance networks onto these devices becomes a challenging endeavor. To address this set of challenges, this paper introduces a methodology grounded in target differential probability for multi-teacher knowledge distillation. This approach effectively reduces the model's complexity while preserving the network models' commendable performance. By conducting a statistical analysis of the model's output probability distribution and computing the differential probabilities between teacher and student networks, we regulate the output proportions of teachers within the multi-teacher model. As a result, a high-performing student network is cultivated. In aggregate, experimental validation on two distinct datasets substantiates the efficacy of this approach in facilitating efficient knowledge transfer, consequently yielding heightened performance outcomes.
AB - In recent years, with the advancement of deep learning, convolutional neural networks tailored for processing image content have gained increasing attention within the remote sensing imagery domain. However, deep convolutional neural networks that exhibit exceptional performance often encompass a substantial number of parameters, thereby necessitating heightened computational resources and time for network training and prediction. Given the constraints posed by resource limitations in the realm of remote sensing, deploying such high-performance networks onto these devices becomes a challenging endeavor. To address this set of challenges, this paper introduces a methodology grounded in target differential probability for multi-teacher knowledge distillation. This approach effectively reduces the model's complexity while preserving the network models' commendable performance. By conducting a statistical analysis of the model's output probability distribution and computing the differential probabilities between teacher and student networks, we regulate the output proportions of teachers within the multi-teacher model. As a result, a high-performing student network is cultivated. In aggregate, experimental validation on two distinct datasets substantiates the efficacy of this approach in facilitating efficient knowledge transfer, consequently yielding heightened performance outcomes.
KW - KNOWLEDGE DISTILLATION
KW - MODEL COMPRESSION
KW - MULTIPLE TEACHERS
KW - REMOTE SENSING
KW - SCENE CLASSIFICATION
UR - http://www.scopus.com/inward/record.url?scp=85203195582&partnerID=8YFLogxK
U2 - 10.1049/icp.2024.1191
DO - 10.1049/icp.2024.1191
M3 - Conference article
AN - SCOPUS:85203195582
SN - 2732-4494
VL - 2023
SP - 817
EP - 821
JO - IET Conference Proceedings
JF - IET Conference Proceedings
IS - 47
T2 - IET International Radar Conference 2023, IRC 2023
Y2 - 3 December 2023 through 5 December 2023
ER -