TY - JOUR
T1 - Hierarchical Self-Distilled Feature Learning for Fine-Grained Visual Categorization
AU - Hu, Yutao
AU - Jiang, Xiaolong
AU - Liu, Xuhui
AU - Luo, Xiaoyan
AU - Hu, Yao
AU - Cao, Xianbin
AU - Zhang, Baochang
AU - Zhang, Jun
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Fine-grained visual categorization (FGVC) relies on hierarchical features extracted by deep convolutional neural networks (CNNs) to recognize closely alike objects. Particularly, shallow layer features containing rich spatial details are vital for specifying subtle differences between objects but are usually inadequately optimized due to gradient vanishing during backpropagation. In this article, hierarchical self-distillation (HSD) is introduced to generate well-optimized CNNs features for accurate fine-grained categorization. HSD inherits from the widely applied deep supervision and implements multiple intermediate losses for reinforced gradients. Besides that, we observe that the hard (one-hot) labels adopted for intermediate supervision hurt the performance of FGVC by enforcing overstrict supervision. As a solution, HSD seeks self-distillation where soft predictions generated by deeper layers of the network are hierarchically exploited to supervise shallow parts. Moreover, self-information entropy loss (SIELoss) is designed in HSD to adaptively soften intermediate predictions and facilitate better convergence. In addition, the gradient detached fusion (GDF) module is incorporated to produce an ensemble result with multiscale features via effective feature fusion. Extensive experiments on four challenging fine-grained datasets show that, with neglectable parameter increase, the proposed HSD framework and the GDF module both bring significant performance gains over different backbones, which also achieves state-of-the-art classification performance.
AB - Fine-grained visual categorization (FGVC) relies on hierarchical features extracted by deep convolutional neural networks (CNNs) to recognize closely alike objects. Particularly, shallow layer features containing rich spatial details are vital for specifying subtle differences between objects but are usually inadequately optimized due to gradient vanishing during backpropagation. In this article, hierarchical self-distillation (HSD) is introduced to generate well-optimized CNNs features for accurate fine-grained categorization. HSD inherits from the widely applied deep supervision and implements multiple intermediate losses for reinforced gradients. Besides that, we observe that the hard (one-hot) labels adopted for intermediate supervision hurt the performance of FGVC by enforcing overstrict supervision. As a solution, HSD seeks self-distillation where soft predictions generated by deeper layers of the network are hierarchically exploited to supervise shallow parts. Moreover, self-information entropy loss (SIELoss) is designed in HSD to adaptively soften intermediate predictions and facilitate better convergence. In addition, the gradient detached fusion (GDF) module is incorporated to produce an ensemble result with multiscale features via effective feature fusion. Extensive experiments on four challenging fine-grained datasets show that, with neglectable parameter increase, the proposed HSD framework and the GDF module both bring significant performance gains over different backbones, which also achieves state-of-the-art classification performance.
KW - Feature fusion
KW - fine-grained visual categorization (FGVC)
KW - hierarchical feature learning
KW - knowledge distillation
UR - http://www.scopus.com/inward/record.url?scp=86000425843&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2021.3124135
DO - 10.1109/TNNLS.2021.3124135
M3 - Article
C2 - 34780336
AN - SCOPUS:86000425843
SN - 2162-237X
VL - 36
SP - 4005
EP - 4018
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 3
ER -