上下文建模与推理的视频异常事件检测

Translated title of the contribution: Context Modeling and Reasoning for Video Abnormal Event Detection

Che Sun, Yu Wei Wu*, Yun De Jia

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Video abnormal event detection aims to automatically detect events that do not conform to the regularities of normal events in videos. Many normal events and abnormal events in videos are caused by the interactions between event objects and scenes or other objects, and thus they are usually object-centric and highly contextual. Currently, it is still an open problem to discriminate abnormal events by acquiring high-level semantic context information from low-level visual features in videos. To this end, we propose a novel context modeling and reasoning method for video abnormal event detection. The method mines event-related semantic context information from video data by generating video context graphs, which is able to narrow the semantic gap between the low-level visual features in videos and the high-level semantics of abnormal events, and then uses the semantic context information to discriminate abnormal events correctly in videos. Specifically, we first use a pre-trained object detection neural network to extract the initial appearance features of all objects, the spatiotemporal relationship features between different objects, as well as the scene features. Then we devise a context graph inference module to explicitly model three types of semantic contexts, including individual object behaviors, pairwise relationships among different objects, and interactions between objects and scenes, where the nodes of the graph could describe the object and scene features, and the edges of the graph describe the spatio-temporal relationship features. We finally build an anomaly prediction module to discriminate abnormal events according to the semantic contexts captured from the previous context graph in videos. The proposed context graph inference module is based on the mean-field theory, and includes multiple recurrent neural networks with message-passing modules. The message- passing modules iteratively update the state of nodes and edges in the context graph for inferring the high-level semantic contexts from the low-level feature representations. The proposed anomaly prediction module consists of two attention-pooling network layers and one fully-connected network layer. The obtained context information is finally fed into the anomaly prediction module to calculate anomaly scores of all video frames for video abnormal event detection. In experiments, we introduce a strategy to train the network mod-is in four manners, including unsupervised, semi-supervised, weakly supervised and supervised manners. In this way, the spatiotemporal context graph inference module and anomaly prediction module are trained in an end-to-end manner seamlessly, such that they reinforce each other. The context reasoning method is evaluated on four public challenging datasets, including three semi-supervised datasets, i. e., the Subway (Entrance/Exit) dataset, Avenue dataset and Shanghai Tech dataset, as well as a supervised UCF-Crime dataset, respectively. Compared with existing methods without context modeling and reasoning, our context modeling and reasoning method improves the unsupervised AUC values by 2.7%/3.1%, 2.0% and 2.9% on the Subway (Entrance/Exit) dataset, Avenue dataset and Shanghai Tech dataset, and improves the semi-supervised AUC values by 3.5 %/3.3%, 4.0% and 4.3%, respectively. Compared with existing methods without considering context modeling and reasoning on the supervised UCF-Crime dataset, our method significantly improves the semi-supervised, weakly-supervised and supervised AUC values by 2. 1%, 0. 4% and 9. 2%, respectively.

Translated title of the contributionContext Modeling and Reasoning for Video Abnormal Event Detection
Original languageChinese (Traditional)
Pages (from-to)2368-2386
Number of pages19
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume47
Issue number10
DOIs
Publication statusPublished - Oct 2024

Fingerprint

Dive into the research topics of 'Context Modeling and Reasoning for Video Abnormal Event Detection'. Together they form a unique fingerprint.

Cite this