TY - JOUR
T1 - Adversarial 3D Convolutional Auto-Encoder for Abnormal Event Detection in Videos
AU - Sun, Che
AU - Jia, Yunde
AU - Song, Hao
AU - Wu, Yuwei
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Abnormal event detection aims to identify the events that deviate from expected normal patterns. Existing methods usually extract normal spatio-temporal patterns of appearance and motion in a separate manner, which ignores low-level correlations between appearance and motion patterns and may fall short of capturing fine-grained spatio-temporal patterns. In this paper, we propose to simultaneously learn appearance and motion to obtain fine-grained spatio-temporal patterns. To this end, we present an adversarial 3D convolutional auto-encoder to learn the normal spatio-temporal patterns and then identify abnormal events by diverging them from the learned normal patterns in videos. The encoder captures the low-level correlations between spatial and temporal dimensions of videos, and generates distinctive features representing visual spatio-temporal information. The decoder reconstrucccts the original video from the encoded features representing by 3D de-convolutions and learns the normal spatio-temporal patterns in an unsupervised manner. We introduce the denoising reconstruction error and adversarial learning strategy to train the 3D convolutional auto-encoder to implicitly learn accurate data distributions that are considered normal patterns, which benefits enhancing the reconstruction ability of the auto-encoder to discriminate abnormal events. Both the theoretical analysis and the extensive experiments on four publicly available datasets demonstrate the effectiveness of our method.
AB - Abnormal event detection aims to identify the events that deviate from expected normal patterns. Existing methods usually extract normal spatio-temporal patterns of appearance and motion in a separate manner, which ignores low-level correlations between appearance and motion patterns and may fall short of capturing fine-grained spatio-temporal patterns. In this paper, we propose to simultaneously learn appearance and motion to obtain fine-grained spatio-temporal patterns. To this end, we present an adversarial 3D convolutional auto-encoder to learn the normal spatio-temporal patterns and then identify abnormal events by diverging them from the learned normal patterns in videos. The encoder captures the low-level correlations between spatial and temporal dimensions of videos, and generates distinctive features representing visual spatio-temporal information. The decoder reconstrucccts the original video from the encoded features representing by 3D de-convolutions and learns the normal spatio-temporal patterns in an unsupervised manner. We introduce the denoising reconstruction error and adversarial learning strategy to train the 3D convolutional auto-encoder to implicitly learn accurate data distributions that are considered normal patterns, which benefits enhancing the reconstruction ability of the auto-encoder to discriminate abnormal events. Both the theoretical analysis and the extensive experiments on four publicly available datasets demonstrate the effectiveness of our method.
KW - Adversarial 3D convolutional auto-encoder
KW - abnormal event detection
KW - adversarial learning
KW - normal patterns
UR - http://www.scopus.com/inward/record.url?scp=85115994018&partnerID=8YFLogxK
U2 - 10.1109/TMM.2020.3023303
DO - 10.1109/TMM.2020.3023303
M3 - Article
AN - SCOPUS:85115994018
SN - 1520-9210
VL - 23
SP - 3292
EP - 3305
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -