TY - GEN
T1 - Adaptive Recursive Circle Framework for Fine-Grained Action Recognition
AU - Lin, Hanxi
AU - Zhao, Wentian
AU - Wu, Xinxiao
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Intuitively, distinguishing fine-grained actions in videos requires recursively capturing subtle visual cues and learning abstract features. However, existing deep neural network based methods are counter-intuitive in that their network layers do not explicitly model the recursive feature abstraction. Therefore, we are motivated to propose an Adaptive Recursive Circle (ARC) framework that equips common neural network layers with recursive attention and recursive fusion. ARC layer inherits the same operators and parameters as the original layer, but, most critically, it treats the layer input as an evolving state, thus explicitly achieving recursive feature abstraction by alternating the state update and the feature generation. Specifically, at each recursive step, the input state is firstly updated via both recursive attention and recursive fusion from the previously generated features, and then the feature abstraction is performed with the newly updated input state. Significant improvements are observed on multiple datasets. For example, an ARC-equipped TSM-ResNet-18 outperforms TSM-ResNet-50 on the Something-Something V1 and Diving48 datasets with only half over-heads. Code will be available at: https://github.com/0HaNC/ARC-ActionRecog.
AB - Intuitively, distinguishing fine-grained actions in videos requires recursively capturing subtle visual cues and learning abstract features. However, existing deep neural network based methods are counter-intuitive in that their network layers do not explicitly model the recursive feature abstraction. Therefore, we are motivated to propose an Adaptive Recursive Circle (ARC) framework that equips common neural network layers with recursive attention and recursive fusion. ARC layer inherits the same operators and parameters as the original layer, but, most critically, it treats the layer input as an evolving state, thus explicitly achieving recursive feature abstraction by alternating the state update and the feature generation. Specifically, at each recursive step, the input state is firstly updated via both recursive attention and recursive fusion from the previously generated features, and then the feature abstraction is performed with the newly updated input state. Significant improvements are observed on multiple datasets. For example, an ARC-equipped TSM-ResNet-18 outperforms TSM-ResNet-50 on the Something-Something V1 and Diving48 datasets with only half over-heads. Code will be available at: https://github.com/0HaNC/ARC-ActionRecog.
KW - fine-grained action recognition
KW - recursive representation
KW - representation learning
KW - visual reasoning
UR - http://www.scopus.com/inward/record.url?scp=85137739490&partnerID=8YFLogxK
U2 - 10.1109/ICME52920.2022.9859982
DO - 10.1109/ICME52920.2022.9859982
M3 - Conference contribution
AN - SCOPUS:85137739490
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - ICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
PB - IEEE Computer Society
T2 - 2022 IEEE International Conference on Multimedia and Expo, ICME 2022
Y2 - 18 July 2022 through 22 July 2022
ER -