FPD: Feature Pyramid Knowledge Distillation

Qi Wang; Lu Liu; Wenxin Yu; Zhiqiang Zhang; Yuxin Liu; Shiyu Cheng; Xuewen Zhang; Jun Gong

doi:10.1007/978-3-031-30105-6_9

FPD: Feature Pyramid Knowledge Distillation

Qi Wang, Lu Liu, Wenxin Yu^*, Zhiqiang Zhang, Yuxin Liu, Shiyu Cheng, Xuewen Zhang, Jun Gong

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Citations (Scopus)

Abstract

Knowledge distillation is a commonly used method for model compression, aims to compress a powerful yet cumbersome model into a lightweight model without much sacrifice of performance, giving the accuracy of a lightweight model close to that of the cumbersome model. Commonly, the efficient but bulky model is called the teacher model and the lightweight model is called the student model. For this purpose, various approaches have been proposed over the past few years. Some classical distillation methods are mainly based on distilling deep features from the intermediate layer or the logits layer, and some methods combine knowledge distillation with contrastive learning. However, classical distillation methods have a significant gap in feature representation between teacher and student, and contrastive learning distillation methods also need massive diversified data for training. For above these issues, our study aims to narrow the gap in feature representation between teacher and student and obtain more feature representation from images in limited datasets to achieve better performance. In addition, the superiority of our method is all validated on a generalized dataset (CIFAR-100) and a small-scale dataset (CIFAR-10). On CIFAR-100, we achieve 19.21%, 20.01% of top-1 error with Resnet50 and Resnet18, respectively. Especially, Resnet50 and Resnet18 as student model achieves better performance than the pre-trained Resnet152 and Resnet34 teacher model. On CIFAR-10, we perform 4.22% of top-1 error with Resnet-18. Whether on CIFAR-10 or CIFAR-100, we all achieve better performance, and even the student model performs better than the teacher.

Original language	English
Title of host publication	Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings
Editors	Mohammad Tanveer, Sonali Agarwal, Seiichi Ozawa, Asif Ekbal, Adam Jatowt
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	100-111
Number of pages	12
ISBN (Print)	9783031301049
DOIs	https://doi.org/10.1007/978-3-031-30105-6_9
Publication status	Published - 2023
Externally published	Yes
Event	29th International Conference on Neural Information Processing, ICONIP 2022 - Virtual, Online Duration: 22 Nov 2022 → 26 Nov 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13623 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	29th International Conference on Neural Information Processing, ICONIP 2022
City	Virtual, Online
Period	22/11/22 → 26/11/22

Keywords

Feature pyramid distillation
Feature pyramid network
Knowledge distillation

Access to Document

10.1007/978-3-031-30105-6_9

Cite this

Wang, Q., Liu, L., Yu, W., Zhang, Z., Liu, Y., Cheng, S., Zhang, X., & Gong, J. (2023). FPD: Feature Pyramid Knowledge Distillation. In M. Tanveer, S. Agarwal, S. Ozawa, A. Ekbal, & A. Jatowt (Eds.), Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings (pp. 100-111). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13623 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-30105-6_9

Wang, Qi ; Liu, Lu ; Yu, Wenxin et al. / FPD : Feature Pyramid Knowledge Distillation. Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings. editor / Mohammad Tanveer ; Sonali Agarwal ; Seiichi Ozawa ; Asif Ekbal ; Adam Jatowt. Springer Science and Business Media Deutschland GmbH, 2023. pp. 100-111 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{6b116097d5274cc397e05a86a8b19631,

title = "FPD: Feature Pyramid Knowledge Distillation",

abstract = "Knowledge distillation is a commonly used method for model compression, aims to compress a powerful yet cumbersome model into a lightweight model without much sacrifice of performance, giving the accuracy of a lightweight model close to that of the cumbersome model. Commonly, the efficient but bulky model is called the teacher model and the lightweight model is called the student model. For this purpose, various approaches have been proposed over the past few years. Some classical distillation methods are mainly based on distilling deep features from the intermediate layer or the logits layer, and some methods combine knowledge distillation with contrastive learning. However, classical distillation methods have a significant gap in feature representation between teacher and student, and contrastive learning distillation methods also need massive diversified data for training. For above these issues, our study aims to narrow the gap in feature representation between teacher and student and obtain more feature representation from images in limited datasets to achieve better performance. In addition, the superiority of our method is all validated on a generalized dataset (CIFAR-100) and a small-scale dataset (CIFAR-10). On CIFAR-100, we achieve 19.21%, 20.01% of top-1 error with Resnet50 and Resnet18, respectively. Especially, Resnet50 and Resnet18 as student model achieves better performance than the pre-trained Resnet152 and Resnet34 teacher model. On CIFAR-10, we perform 4.22% of top-1 error with Resnet-18. Whether on CIFAR-10 or CIFAR-100, we all achieve better performance, and even the student model performs better than the teacher.",

keywords = "Feature pyramid distillation, Feature pyramid network, Knowledge distillation",

author = "Qi Wang and Lu Liu and Wenxin Yu and Zhiqiang Zhang and Yuxin Liu and Shiyu Cheng and Xuewen Zhang and Jun Gong",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 29th International Conference on Neural Information Processing, ICONIP 2022 ; Conference date: 22-11-2022 Through 26-11-2022",

year = "2023",

doi = "10.1007/978-3-031-30105-6_9",

language = "English",

isbn = "9783031301049",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "100--111",

editor = "Mohammad Tanveer and Sonali Agarwal and Seiichi Ozawa and Asif Ekbal and Adam Jatowt",

booktitle = "Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings",

address = "Germany",

}

Wang, Q, Liu, L, Yu, W, Zhang, Z, Liu, Y, Cheng, S, Zhang, X & Gong, J 2023, FPD: Feature Pyramid Knowledge Distillation. in M Tanveer, S Agarwal, S Ozawa, A Ekbal & A Jatowt (eds), Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13623 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 100-111, 29th International Conference on Neural Information Processing, ICONIP 2022, Virtual, Online, 22/11/22. https://doi.org/10.1007/978-3-031-30105-6_9

FPD: Feature Pyramid Knowledge Distillation. / Wang, Qi; Liu, Lu; Yu, Wenxin et al.
Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings. ed. / Mohammad Tanveer; Sonali Agarwal; Seiichi Ozawa; Asif Ekbal; Adam Jatowt. Springer Science and Business Media Deutschland GmbH, 2023. p. 100-111 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13623 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - FPD

T2 - 29th International Conference on Neural Information Processing, ICONIP 2022

AU - Wang, Qi

AU - Liu, Lu

AU - Yu, Wenxin

AU - Zhang, Zhiqiang

AU - Liu, Yuxin

AU - Cheng, Shiyu

AU - Zhang, Xuewen

AU - Gong, Jun

PY - 2023

Y1 - 2023

N2 - Knowledge distillation is a commonly used method for model compression, aims to compress a powerful yet cumbersome model into a lightweight model without much sacrifice of performance, giving the accuracy of a lightweight model close to that of the cumbersome model. Commonly, the efficient but bulky model is called the teacher model and the lightweight model is called the student model. For this purpose, various approaches have been proposed over the past few years. Some classical distillation methods are mainly based on distilling deep features from the intermediate layer or the logits layer, and some methods combine knowledge distillation with contrastive learning. However, classical distillation methods have a significant gap in feature representation between teacher and student, and contrastive learning distillation methods also need massive diversified data for training. For above these issues, our study aims to narrow the gap in feature representation between teacher and student and obtain more feature representation from images in limited datasets to achieve better performance. In addition, the superiority of our method is all validated on a generalized dataset (CIFAR-100) and a small-scale dataset (CIFAR-10). On CIFAR-100, we achieve 19.21%, 20.01% of top-1 error with Resnet50 and Resnet18, respectively. Especially, Resnet50 and Resnet18 as student model achieves better performance than the pre-trained Resnet152 and Resnet34 teacher model. On CIFAR-10, we perform 4.22% of top-1 error with Resnet-18. Whether on CIFAR-10 or CIFAR-100, we all achieve better performance, and even the student model performs better than the teacher.

AB - Knowledge distillation is a commonly used method for model compression, aims to compress a powerful yet cumbersome model into a lightweight model without much sacrifice of performance, giving the accuracy of a lightweight model close to that of the cumbersome model. Commonly, the efficient but bulky model is called the teacher model and the lightweight model is called the student model. For this purpose, various approaches have been proposed over the past few years. Some classical distillation methods are mainly based on distilling deep features from the intermediate layer or the logits layer, and some methods combine knowledge distillation with contrastive learning. However, classical distillation methods have a significant gap in feature representation between teacher and student, and contrastive learning distillation methods also need massive diversified data for training. For above these issues, our study aims to narrow the gap in feature representation between teacher and student and obtain more feature representation from images in limited datasets to achieve better performance. In addition, the superiority of our method is all validated on a generalized dataset (CIFAR-100) and a small-scale dataset (CIFAR-10). On CIFAR-100, we achieve 19.21%, 20.01% of top-1 error with Resnet50 and Resnet18, respectively. Especially, Resnet50 and Resnet18 as student model achieves better performance than the pre-trained Resnet152 and Resnet34 teacher model. On CIFAR-10, we perform 4.22% of top-1 error with Resnet-18. Whether on CIFAR-10 or CIFAR-100, we all achieve better performance, and even the student model performs better than the teacher.

KW - Feature pyramid distillation

KW - Feature pyramid network

KW - Knowledge distillation

UR - http://www.scopus.com/inward/record.url?scp=85161340602&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-30105-6_9

DO - 10.1007/978-3-031-30105-6_9

M3 - Conference contribution

AN - SCOPUS:85161340602

SN - 9783031301049

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 100

EP - 111

BT - Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings

A2 - Tanveer, Mohammad

A2 - Agarwal, Sonali

A2 - Ozawa, Seiichi

A2 - Ekbal, Asif

A2 - Jatowt, Adam

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 22 November 2022 through 26 November 2022

ER -

Wang Q, Liu L, Yu W, Zhang Z, Liu Y, Cheng S et al. FPD: Feature Pyramid Knowledge Distillation. In Tanveer M, Agarwal S, Ozawa S, Ekbal A, Jatowt A, editors, Neural Information Processing - 29th International Conference, ICONIP 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 100-111. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-30105-6_9

FPD: Feature Pyramid Knowledge Distillation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this