TY - GEN
T1 - Revisiting Cross-Architecture Distillation
T2 - 40th AAAI Conference on Artificial Intelligence, AAAI 2026
AU - Peng, Ying
AU - Ye, Hongsen
AU - Huang, Changxin
AU - Hu, Xiping
AU - Chen, Jian
AU - Zeng, Runhao
N1 - Publisher Copyright:
© 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2026
Y1 - 2026
N2 - Vision Transformers (ViTs) have achieved strong performance in video action recognition, but their high computational cost limits their practicality. Lightweight CNNs are more efficient but suffer from accuracy gaps. Cross-Architecture Knowledge Distillation (CAKD) addresses this by transferring knowledge from ViTs to CNNs, yet existing methods often struggle with architectural mismatch and overlook the value of stronger homogeneous CNN teachers. To tackle these challenges, we propose a Dual-Teacher Knowledge Distillation framework that leverages both a heterogeneous ViT teacher and a homogeneous CNN teacher to collaboratively guide a lightweight CNN student. We introduce two key components: (1) Discrepancy-Aware Teacher Weighting, which dynamically fuses the predictions from ViT and CNN teachers by assigning adaptive weights based on teacher confidence and prediction discrepancy with the student, enabling more informative and effective supervision; and (2) a Structure Discrepancy-Aware Distillation strategy, where the student learns the residual features between ViT and CNN teachers via a lightweight auxiliary branch, focusing on transferable architectural differences without mimicking all of ViT’s high-dimensional patterns. Extensive experiments on benchmarks including HMDB51, EPIC-KITCHENS-100, and Kinetics-400, demonstrate that our method consistently outperforms state-of-the-art distillation approaches, achieving notable performance improvements with a maximum accuracy gain of 5.95% on HMDB51.
AB - Vision Transformers (ViTs) have achieved strong performance in video action recognition, but their high computational cost limits their practicality. Lightweight CNNs are more efficient but suffer from accuracy gaps. Cross-Architecture Knowledge Distillation (CAKD) addresses this by transferring knowledge from ViTs to CNNs, yet existing methods often struggle with architectural mismatch and overlook the value of stronger homogeneous CNN teachers. To tackle these challenges, we propose a Dual-Teacher Knowledge Distillation framework that leverages both a heterogeneous ViT teacher and a homogeneous CNN teacher to collaboratively guide a lightweight CNN student. We introduce two key components: (1) Discrepancy-Aware Teacher Weighting, which dynamically fuses the predictions from ViT and CNN teachers by assigning adaptive weights based on teacher confidence and prediction discrepancy with the student, enabling more informative and effective supervision; and (2) a Structure Discrepancy-Aware Distillation strategy, where the student learns the residual features between ViT and CNN teachers via a lightweight auxiliary branch, focusing on transferable architectural differences without mimicking all of ViT’s high-dimensional patterns. Extensive experiments on benchmarks including HMDB51, EPIC-KITCHENS-100, and Kinetics-400, demonstrate that our method consistently outperforms state-of-the-art distillation approaches, achieving notable performance improvements with a maximum accuracy gain of 5.95% on HMDB51.
UR - https://www.scopus.com/pages/publications/105034965959
U2 - 10.1609/aaai.v40i10.37786
DO - 10.1609/aaai.v40i10.37786
M3 - Conference contribution
AN - SCOPUS:105034965959
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
SN - 9781577359067
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 8367
EP - 8375
BT - Proceedings of the AAAI Conference on Artificial Intelligence
A2 - Koenig, Sven
A2 - Jenkins, Chad
A2 - Taylor, Matthew E.
PB - Association for the Advancement of Artificial Intelligence
Y2 - 20 January 2026 through 27 January 2026
ER -