Boundary-sensitive denoised temporal reasoning network for video action segmentation

Zhichao Ma; Kan Li

doi:10.1007/s11760-024-03199-w

Boundary-sensitive denoised temporal reasoning network for video action segmentation

Zhichao Ma, Kan Li^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

Original language	English
Journal	Signal, Image and Video Processing
DOIs	https://doi.org/10.1007/s11760-024-03199-w
Publication status	Accepted/In press - 2024

Keywords

Action relation modeling
Boundary-sensitive network
Energy-guided graph reasoning
Video action segmentation
Video encoding

Access to Document

10.1007/s11760-024-03199-w

Cite this

Ma, Z., & Li, K. (Accepted/In press). Boundary-sensitive denoised temporal reasoning network for video action segmentation. Signal, Image and Video Processing. https://doi.org/10.1007/s11760-024-03199-w

@article{504dfc46b6984acf91802f531d03b3eb,

title = "Boundary-sensitive denoised temporal reasoning network for video action segmentation",

abstract = "Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.",

keywords = "Action relation modeling, Boundary-sensitive network, Energy-guided graph reasoning, Video action segmentation, Video encoding",

author = "Zhichao Ma and Kan Li",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.",

year = "2024",

doi = "10.1007/s11760-024-03199-w",

language = "English",

journal = "Signal, Image and Video Processing",

issn = "1863-1703",

publisher = "Springer London",

}

TY - JOUR

T1 - Boundary-sensitive denoised temporal reasoning network for video action segmentation

AU - Ma, Zhichao

AU - Li, Kan

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

PY - 2024

Y1 - 2024

N2 - Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

AB - Video action segmentation is still challenging since existing models confuse similar actions and actional transition regions, leading to incorrect action inferences and serious over-segmentation errors. To address these issues, we present a novel action segmentation framework, called boundary-sensitive denoised temporal reasoning network, which explores a novel boundary-driven refiner (BR) receiving the boundary clue from a novel boundary detector (BD) to predict the segmentation. Our BD can perceive action boundaries more accurately by overcoming the disturbance of similar actions and response shifts with some novel structures that complement each other. Our BR is built by a graph energy structure whose strong ability of temporal reasoning is derived from overcoming noisy features. Roughly, an energy-based structure adjusts message passing on the graph adaptively for noise immunity, also a training mechanism adjusts margins adaptively, increasing feature distinguishability among similar actions. The cooperation of BD and BR can improve the quality of segmentation hugely, which can be embedded into other models. Our framework is demonstrated to be effective in overcoming the above issues and achieves some new state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

KW - Action relation modeling

KW - Boundary-sensitive network

KW - Energy-guided graph reasoning

KW - Video action segmentation

KW - Video encoding

UR - http://www.scopus.com/inward/record.url?scp=85191234429&partnerID=8YFLogxK

U2 - 10.1007/s11760-024-03199-w

DO - 10.1007/s11760-024-03199-w

M3 - Article

AN - SCOPUS:85191234429

SN - 1863-1703

JO - Signal, Image and Video Processing

JF - Signal, Image and Video Processing

ER -

Boundary-sensitive denoised temporal reasoning network for video action segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this