TY - JOUR
T1 - Boundary-sensitive denoised temporal reasoning network for video action segmentation
AU - Ma, Zhichao
AU - Li, Kan
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
PY - 2024
Y1 - 2024
N2 - Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.
AB - Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.
KW - Action relation modeling
KW - Boundary-sensitive network
KW - Energy-guided graph reasoning
KW - Video action segmentation
KW - Video encoding
UR - http://www.scopus.com/inward/record.url?scp=85191234429&partnerID=8YFLogxK
U2 - 10.1007/s11760-024-03199-w
DO - 10.1007/s11760-024-03199-w
M3 - Article
AN - SCOPUS:85191234429
SN - 1863-1703
JO - Signal, Image and Video Processing
JF - Signal, Image and Video Processing
ER -