Boundary-sensitive denoised temporal reasoning network for video action segmentation

Zhichao Ma, Kan Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

Original languageEnglish
JournalSignal, Image and Video Processing
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • Action relation modeling
  • Boundary-sensitive network
  • Energy-guided graph reasoning
  • Video action segmentation
  • Video encoding

Fingerprint

Dive into the research topics of 'Boundary-sensitive denoised temporal reasoning network for video action segmentation'. Together they form a unique fingerprint.

Cite this