Boundary-sensitive denoised temporal reasoning network for video action segmentation

Zhichao Ma, Kan Li*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

源语言英语
期刊Signal, Image and Video Processing
DOI
出版状态已接受/待刊 - 2024

指纹

探究 'Boundary-sensitive denoised temporal reasoning network for video action segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此