Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement

Zhichao Ma; Kan Li

doi:10.1007/s00138-023-01505-z

Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement

Zhichao Ma, Kan Li^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Video action segmentation is a crucial task in evaluating the ability to understand human activities. Previous works on this task mainly focus on capturing complex temporal structures and fail to consider the feature ambiguity among similar actions and the biased training sets, thus they are easy to confuse some actions. In this paper, we propose a novel action segmentation framework, called DeConfuNet, to solve the above issue. First, we design a discriminative enhancement module (DEM) trained by an adaptive margin-guided discriminative feature learning which adjusts the margin adaptively to increase the feature distinguishability among similar actions, and whose multi-stage reasoning and adaptive feature fusion structures provide structural advantages for distinguishing similar actions. Second, we propose an equalizing influence module (EIM) that can overcome the impact of biased training sets by balancing the influence of training samples under a coefficient-adaptive loss function. Third, an energy and context-driven refinement module (ECRM) further alleviates the impact of the unbalanced influence of training samples by fusing and refining the inference of DEM and EIM, which utilizes the phased prediction including context and energy clues to assimilate untrustworthy segments, alleviating over-segmentation hugely. Extensive experiments show the effectiveness of each proposed technique, they verify that the DEM and EIM are complementary in reasoning and cooperate to overcome the confusion issue, and our approach achieves significant improvement and state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

Original language	English
Article number	21
Journal	Machine Vision and Applications
Volume	35
Issue number	2
DOIs	https://doi.org/10.1007/s00138-023-01505-z
Publication status	Published - Mar 2024

Keywords

Action assimilation operator
Action segmentation
Adaptive margin-guided discriminative feature learning
Coefficient-adaptive loss function
Energy and context-driven refinement module

Access to Document

10.1007/s00138-023-01505-z

Cite this

@article{b62a59a505b240ac81eea39a4a32d464,

title = "Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement",

abstract = "Video action segmentation is a crucial task in evaluating the ability to understand human activities. Previous works on this task mainly focus on capturing complex temporal structures and fail to consider the feature ambiguity among similar actions and the biased training sets, thus they are easy to confuse some actions. In this paper, we propose a novel action segmentation framework, called DeConfuNet, to solve the above issue. First, we design a discriminative enhancement module (DEM) trained by an adaptive margin-guided discriminative feature learning which adjusts the margin adaptively to increase the feature distinguishability among similar actions, and whose multi-stage reasoning and adaptive feature fusion structures provide structural advantages for distinguishing similar actions. Second, we propose an equalizing influence module (EIM) that can overcome the impact of biased training sets by balancing the influence of training samples under a coefficient-adaptive loss function. Third, an energy and context-driven refinement module (ECRM) further alleviates the impact of the unbalanced influence of training samples by fusing and refining the inference of DEM and EIM, which utilizes the phased prediction including context and energy clues to assimilate untrustworthy segments, alleviating over-segmentation hugely. Extensive experiments show the effectiveness of each proposed technique, they verify that the DEM and EIM are complementary in reasoning and cooperate to overcome the confusion issue, and our approach achieves significant improvement and state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.",

keywords = "Action assimilation operator, Action segmentation, Adaptive margin-guided discriminative feature learning, Coefficient-adaptive loss function, Energy and context-driven refinement module",

author = "Zhichao Ma and Kan Li",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.",

year = "2024",

month = mar,

doi = "10.1007/s00138-023-01505-z",

language = "English",

volume = "35",

journal = "Machine Vision and Applications",

issn = "0932-8092",

publisher = "Springer Verlag",

number = "2",

}

TY - JOUR

T1 - Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement

AU - Ma, Zhichao

AU - Li, Kan

PY - 2024/3

Y1 - 2024/3

N2 - Video action segmentation is a crucial task in evaluating the ability to understand human activities. Previous works on this task mainly focus on capturing complex temporal structures and fail to consider the feature ambiguity among similar actions and the biased training sets, thus they are easy to confuse some actions. In this paper, we propose a novel action segmentation framework, called DeConfuNet, to solve the above issue. First, we design a discriminative enhancement module (DEM) trained by an adaptive margin-guided discriminative feature learning which adjusts the margin adaptively to increase the feature distinguishability among similar actions, and whose multi-stage reasoning and adaptive feature fusion structures provide structural advantages for distinguishing similar actions. Second, we propose an equalizing influence module (EIM) that can overcome the impact of biased training sets by balancing the influence of training samples under a coefficient-adaptive loss function. Third, an energy and context-driven refinement module (ECRM) further alleviates the impact of the unbalanced influence of training samples by fusing and refining the inference of DEM and EIM, which utilizes the phased prediction including context and energy clues to assimilate untrustworthy segments, alleviating over-segmentation hugely. Extensive experiments show the effectiveness of each proposed technique, they verify that the DEM and EIM are complementary in reasoning and cooperate to overcome the confusion issue, and our approach achieves significant improvement and state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

AB - Video action segmentation is a crucial task in evaluating the ability to understand human activities. Previous works on this task mainly focus on capturing complex temporal structures and fail to consider the feature ambiguity among similar actions and the biased training sets, thus they are easy to confuse some actions. In this paper, we propose a novel action segmentation framework, called DeConfuNet, to solve the above issue. First, we design a discriminative enhancement module (DEM) trained by an adaptive margin-guided discriminative feature learning which adjusts the margin adaptively to increase the feature distinguishability among similar actions, and whose multi-stage reasoning and adaptive feature fusion structures provide structural advantages for distinguishing similar actions. Second, we propose an equalizing influence module (EIM) that can overcome the impact of biased training sets by balancing the influence of training samples under a coefficient-adaptive loss function. Third, an energy and context-driven refinement module (ECRM) further alleviates the impact of the unbalanced influence of training samples by fusing and refining the inference of DEM and EIM, which utilizes the phased prediction including context and energy clues to assimilate untrustworthy segments, alleviating over-segmentation hugely. Extensive experiments show the effectiveness of each proposed technique, they verify that the DEM and EIM are complementary in reasoning and cooperate to overcome the confusion issue, and our approach achieves significant improvement and state-of-the-art performance of accuracy, edit score, and F1 score on the challenging 50Salads, GTEA, and Breakfast benchmarks.

KW - Action assimilation operator

KW - Action segmentation

KW - Adaptive margin-guided discriminative feature learning

KW - Coefficient-adaptive loss function

KW - Energy and context-driven refinement module

UR - http://www.scopus.com/inward/record.url?scp=85183325782&partnerID=8YFLogxK

U2 - 10.1007/s00138-023-01505-z

DO - 10.1007/s00138-023-01505-z

M3 - Article

AN - SCOPUS:85183325782

SN - 0932-8092

VL - 35

JO - Machine Vision and Applications

JF - Machine Vision and Applications

IS - 2

M1 - 21

ER -

Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this