TY - JOUR
T1 - Dynamic Perception Framework for Fine-Grained Recognition
AU - Ding, Yao
AU - Han, Zhenjun
AU - Zhou, Yanzhao
AU - Zhu, Yi
AU - Chen, Jie
AU - Ye, Qixiang
AU - Jiao, Jianbin
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2022/3/1
Y1 - 2022/3/1
N2 - Fine-grained recognition poses the challenge of discriminating categories with only small subtle visual differences, which can be easily overwhelmed by diverse appearance within categories. Conventional approaches generally locate discriminative parts and then recognize the part-based features. However, we find that tuning the effective receptive field (ERF) of the network to the task plays the key role, which enables significant regions to contribute more to the output. Inspired by the receptive field stimulation mechanism of the visual cortex, we propose a Dynamic Perception framework as a solution. Our framework adapts the ERF by considering the image space and the kernel space simultaneously. In the image space, the Spatial Selective Sampling module is adopted to enlarge informative regions locally. In the kernel space, Spatial Selective Kernel convolution is introduced to adapt different kernel sizes for regions of interest and backgrounds by embedding spatial attention in the multi-path convolution. Extensive experiments on challenging benchmarks, including CUB-200-2011, FGVC-Aircraft, and Stanford Cars, demonstrate that our method yields a performance boost over the state-of-the-art methods.
AB - Fine-grained recognition poses the challenge of discriminating categories with only small subtle visual differences, which can be easily overwhelmed by diverse appearance within categories. Conventional approaches generally locate discriminative parts and then recognize the part-based features. However, we find that tuning the effective receptive field (ERF) of the network to the task plays the key role, which enables significant regions to contribute more to the output. Inspired by the receptive field stimulation mechanism of the visual cortex, we propose a Dynamic Perception framework as a solution. Our framework adapts the ERF by considering the image space and the kernel space simultaneously. In the image space, the Spatial Selective Sampling module is adopted to enlarge informative regions locally. In the kernel space, Spatial Selective Kernel convolution is introduced to adapt different kernel sizes for regions of interest and backgrounds by embedding spatial attention in the multi-path convolution. Extensive experiments on challenging benchmarks, including CUB-200-2011, FGVC-Aircraft, and Stanford Cars, demonstrate that our method yields a performance boost over the state-of-the-art methods.
KW - Dynamic perception
KW - fine-grained recognition
KW - spatial selective kernel
KW - spatial selective sampling
UR - https://www.scopus.com/pages/publications/85103794704
U2 - 10.1109/TCSVT.2021.3069835
DO - 10.1109/TCSVT.2021.3069835
M3 - Article
AN - SCOPUS:85103794704
SN - 1051-8215
VL - 32
SP - 1353
EP - 1365
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 3
ER -