TY - GEN
T1 - Spatiotemporal pyramid pooling in 3D convolutional neural networks for action recognition
AU - Cheng, Cheng
AU - Lv, Pin
AU - Su, Bing
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/29
Y1 - 2018/8/29
N2 - Deep 3-dimensional convolutional networks (3D ConvNets) trained on large scale video datasets have achieved promising results on action recognition. This paper improves their performance by taking into account the spatiotemporal pyramid pooling. Specifically, we propose the spatiotemporal pyramid pooling layer to tackle the temporal variations of video sequences. Based on this layer, we develop a new network architecture, called STPP-net, by incorporating it with 3D ConvNets. The proposed network is robust to spatial and temporal variation of human actions and can generate a fixed-dimensional representation regardless of video size/scale. We show that our new network architecture outperforms the original 3D ConvNets by a large margin on three large-scale video classification/action recognition benchmarks including HMDB51, UCF101, and Kinetics.
AB - Deep 3-dimensional convolutional networks (3D ConvNets) trained on large scale video datasets have achieved promising results on action recognition. This paper improves their performance by taking into account the spatiotemporal pyramid pooling. Specifically, we propose the spatiotemporal pyramid pooling layer to tackle the temporal variations of video sequences. Based on this layer, we develop a new network architecture, called STPP-net, by incorporating it with 3D ConvNets. The proposed network is robust to spatial and temporal variation of human actions and can generate a fixed-dimensional representation regardless of video size/scale. We show that our new network architecture outperforms the original 3D ConvNets by a large margin on three large-scale video classification/action recognition benchmarks including HMDB51, UCF101, and Kinetics.
KW - 3D Convolutional Neural Networks
KW - Spatiotemporal Pyramid Pooling
KW - Video Recognition
UR - http://www.scopus.com/inward/record.url?scp=85062911341&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2018.8451625
DO - 10.1109/ICIP.2018.8451625
M3 - Conference contribution
AN - SCOPUS:85062911341
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 3468
EP - 3472
BT - 2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings
PB - IEEE Computer Society
T2 - 25th IEEE International Conference on Image Processing, ICIP 2018
Y2 - 7 October 2018 through 10 October 2018
ER -