3D convolutional two-stream network for action recognition in videos

Min Li, Yuezhu Qi, Jian Yang, Yanfang Zhang, Junxing Ren, Hong Du

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

In recent years, action recognition based on two-stream networks has developed rapidly. However, most existing methods describe incomplete and distorted video content due to cropped and warped frame or clip-level feature extraction. This paper proposed an approach based on deep learning that preserves the complete contextual relation of temporal human actions in videos. The proposed architecture follows the two-stream network with a novel 3D Convolutional Network (ConvNets) and pyramid pooling layer, to design an end-to-end behavioral feature learning method. The 3D ConvNets extract video-level, spatial-temporal features from two input streams, the RGB images and the corresponding optical flow. The multi-scale pyramid pooling layer fixed the generated feature maps into a unified size regardless of input video size. The final predictions are resulted from a fused softmax scores of two streams, and subject to the weighting factor of each stream. Our experimental results suggest spatial stream slightly higher than the temporal stream, and the performance of the trained model is conditionally optimized. The proposed method is experimented on two challenging video action datasets UCF101 and HMDB51, in which our method achieves the most advanced performance above 96.1% on UCF101 dataset.

Original languageEnglish
Title of host publicationProceedings - IEEE 31st International Conference on Tools with Artificial Intelligence, ICTAI 2019
PublisherIEEE Computer Society
Pages1697-1701
Number of pages5
ISBN (Electronic)9781728137988
DOIs
Publication statusPublished - Nov 2019
Event31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019 - Portland, United States
Duration: 4 Nov 20196 Nov 2019

Publication series

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume2019-November
ISSN (Print)1082-3409

Conference

Conference31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019
Country/TerritoryUnited States
CityPortland
Period4/11/196/11/19

Keywords

  • 3D ConvNets
  • Action recognition
  • Pyramid pooling layer
  • Video-level feature representation

Fingerprint

Dive into the research topics of '3D convolutional two-stream network for action recognition in videos'. Together they form a unique fingerprint.

Cite this