Action Shuffling for Weakly Supervised Temporal Localization

Xiao Yu Zhang; Haichao Shi; Changsheng Li; Xinchu Shi

doi:10.1109/TIP.2022.3185485

Action Shuffling for Weakly Supervised Temporal Localization

Xiao Yu Zhang, Haichao Shi, Changsheng Li, Xinchu Shi^*

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

Abstract

Weakly supervised action localization is a challenging task with extensive applications, which aims to identify actions and the corresponding temporal intervals with only video-level annotations available. This paper analyzes the order-sensitive and location-insensitive properties of actions, and embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance. To be specific, we propose a novel two-branch network architecture with intra/inter-action shuffling, referred to as ActShufNet. The intra-action shuffling branch lays out a self-supervised order prediction task to augment the video representation with inner-video relevance, whereas the inter-action shuffling branch imposes a reorganizing strategy on the existing action contents to augment the training set without resorting to any external resources. Furthermore, the global-local adversarial training is presented to enhance the model's robustness to irrelevant noises. Extensive experiments are conducted on three benchmark datasets, and the results clearly demonstrate the efficacy of the proposed method.

Original language	English
Pages (from-to)	4447-4457
Number of pages	11
Journal	IEEE Transactions on Image Processing
Volume	31
DOIs	https://doi.org/10.1109/TIP.2022.3185485
Publication status	Published - 2022

Keywords

Temporal action localization
inter-action
intra-action
self-supervised

Access to Document

10.1109/TIP.2022.3185485

Cite this

Zhang, X. Y., Shi, H., Li, C., & Shi, X. (2022). Action Shuffling for Weakly Supervised Temporal Localization. IEEE Transactions on Image Processing, 31, 4447-4457. https://doi.org/10.1109/TIP.2022.3185485

@article{d3fe83a72bde4fbdad935980464bf622,

title = "Action Shuffling for Weakly Supervised Temporal Localization",

abstract = "Weakly supervised action localization is a challenging task with extensive applications, which aims to identify actions and the corresponding temporal intervals with only video-level annotations available. This paper analyzes the order-sensitive and location-insensitive properties of actions, and embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance. To be specific, we propose a novel two-branch network architecture with intra/inter-action shuffling, referred to as ActShufNet. The intra-action shuffling branch lays out a self-supervised order prediction task to augment the video representation with inner-video relevance, whereas the inter-action shuffling branch imposes a reorganizing strategy on the existing action contents to augment the training set without resorting to any external resources. Furthermore, the global-local adversarial training is presented to enhance the model's robustness to irrelevant noises. Extensive experiments are conducted on three benchmark datasets, and the results clearly demonstrate the efficacy of the proposed method.",

keywords = "Temporal action localization, inter-action, intra-action, self-supervised",

author = "Zhang, {Xiao Yu} and Haichao Shi and Changsheng Li and Xinchu Shi",

note = "Publisher Copyright: {\textcopyright} 1992-2012 IEEE.",

year = "2022",

doi = "10.1109/TIP.2022.3185485",

language = "English",

volume = "31",

pages = "4447--4457",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Action Shuffling for Weakly Supervised Temporal Localization

AU - Zhang, Xiao Yu

AU - Shi, Haichao

AU - Li, Changsheng

AU - Shi, Xinchu

PY - 2022

Y1 - 2022

N2 - Weakly supervised action localization is a challenging task with extensive applications, which aims to identify actions and the corresponding temporal intervals with only video-level annotations available. This paper analyzes the order-sensitive and location-insensitive properties of actions, and embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance. To be specific, we propose a novel two-branch network architecture with intra/inter-action shuffling, referred to as ActShufNet. The intra-action shuffling branch lays out a self-supervised order prediction task to augment the video representation with inner-video relevance, whereas the inter-action shuffling branch imposes a reorganizing strategy on the existing action contents to augment the training set without resorting to any external resources. Furthermore, the global-local adversarial training is presented to enhance the model's robustness to irrelevant noises. Extensive experiments are conducted on three benchmark datasets, and the results clearly demonstrate the efficacy of the proposed method.

AB - Weakly supervised action localization is a challenging task with extensive applications, which aims to identify actions and the corresponding temporal intervals with only video-level annotations available. This paper analyzes the order-sensitive and location-insensitive properties of actions, and embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance. To be specific, we propose a novel two-branch network architecture with intra/inter-action shuffling, referred to as ActShufNet. The intra-action shuffling branch lays out a self-supervised order prediction task to augment the video representation with inner-video relevance, whereas the inter-action shuffling branch imposes a reorganizing strategy on the existing action contents to augment the training set without resorting to any external resources. Furthermore, the global-local adversarial training is presented to enhance the model's robustness to irrelevant noises. Extensive experiments are conducted on three benchmark datasets, and the results clearly demonstrate the efficacy of the proposed method.

KW - Temporal action localization

KW - inter-action

KW - intra-action

KW - self-supervised

UR - http://www.scopus.com/inward/record.url?scp=85133762521&partnerID=8YFLogxK

U2 - 10.1109/TIP.2022.3185485

DO - 10.1109/TIP.2022.3185485

M3 - Article

C2 - 35763480

AN - SCOPUS:85133762521

SN - 1057-7149

VL - 31

SP - 4447

EP - 4457

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

ER -

Action Shuffling for Weakly Supervised Temporal Localization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this