Content Temporal Relation Network for temporal action proposal generation

Ming Gang Gan; Yan Zhang

doi:10.1016/j.patcog.2023.110245

Content Temporal Relation Network for temporal action proposal generation

Ming Gang Gan, Yan Zhang^*

^*此作品的通讯作者

自动化学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Temporal action proposal generation is an essential step for untrimmed video analysis and gains much attention from academia. However, most of the prior works predict the confidence score of each proposal separately and neglect the relations between proposals, limiting their performance. In this work, we design a novel Content Temporal Relation Network (CTRNet) to generate temporal action proposals by exploring the content and temporal semantic relations between proposals simultaneously. Specifically, we design a proposal feature map generation layer to convert the temporal semantic relations of proposals into spatial relations. Based on the proposal feature map, we propose a content-temporal relation module, which applies a novel adaptive-dilated convolution to model the temporal semantic relations between proposals and designs a content-adaptive convolution operation to explore the content semantic relation between proposals. Considering the temporal and content semantic relations between proposals, CTRNet has learned discriminative proposal features to improve performance. Extensive experiments are performed on two mainstream temporal action detection datasets, and CTRNet significantly outperforms the previous state-of-the-art methods. The codes are available at https://github.com/YanZhang-bit/CTRNet.

源语言	英语
文章编号	110245
期刊	Pattern Recognition
卷	149
DOI	https://doi.org/10.1016/j.patcog.2023.110245
出版状态	已出版 - 5月 2024

访问文件

10.1016/j.patcog.2023.110245

其它文件与链接

链接到 Scopus 的出版物

引用此

Gan, M. G., & Zhang, Y. (2024). Content Temporal Relation Network for temporal action proposal generation. Pattern Recognition, 149, 文章 110245. https://doi.org/10.1016/j.patcog.2023.110245

@article{54a0784e9f15441d8f841e1bf4211bb2,

title = "Content Temporal Relation Network for temporal action proposal generation",

abstract = "Temporal action proposal generation is an essential step for untrimmed video analysis and gains much attention from academia. However, most of the prior works predict the confidence score of each proposal separately and neglect the relations between proposals, limiting their performance. In this work, we design a novel Content Temporal Relation Network (CTRNet) to generate temporal action proposals by exploring the content and temporal semantic relations between proposals simultaneously. Specifically, we design a proposal feature map generation layer to convert the temporal semantic relations of proposals into spatial relations. Based on the proposal feature map, we propose a content-temporal relation module, which applies a novel adaptive-dilated convolution to model the temporal semantic relations between proposals and designs a content-adaptive convolution operation to explore the content semantic relation between proposals. Considering the temporal and content semantic relations between proposals, CTRNet has learned discriminative proposal features to improve performance. Extensive experiments are performed on two mainstream temporal action detection datasets, and CTRNet significantly outperforms the previous state-of-the-art methods. The codes are available at https://github.com/YanZhang-bit/CTRNet.",

keywords = "Proposal–proposal relations, Temporal action detection, Temporal action proposal generation, Untrimmed video analysis",

author = "Gan, {Ming Gang} and Yan Zhang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Ltd",

year = "2024",

month = may,

doi = "10.1016/j.patcog.2023.110245",

language = "English",

volume = "149",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Content Temporal Relation Network for temporal action proposal generation

AU - Gan, Ming Gang

AU - Zhang, Yan

PY - 2024/5

Y1 - 2024/5

N2 - Temporal action proposal generation is an essential step for untrimmed video analysis and gains much attention from academia. However, most of the prior works predict the confidence score of each proposal separately and neglect the relations between proposals, limiting their performance. In this work, we design a novel Content Temporal Relation Network (CTRNet) to generate temporal action proposals by exploring the content and temporal semantic relations between proposals simultaneously. Specifically, we design a proposal feature map generation layer to convert the temporal semantic relations of proposals into spatial relations. Based on the proposal feature map, we propose a content-temporal relation module, which applies a novel adaptive-dilated convolution to model the temporal semantic relations between proposals and designs a content-adaptive convolution operation to explore the content semantic relation between proposals. Considering the temporal and content semantic relations between proposals, CTRNet has learned discriminative proposal features to improve performance. Extensive experiments are performed on two mainstream temporal action detection datasets, and CTRNet significantly outperforms the previous state-of-the-art methods. The codes are available at https://github.com/YanZhang-bit/CTRNet.

AB - Temporal action proposal generation is an essential step for untrimmed video analysis and gains much attention from academia. However, most of the prior works predict the confidence score of each proposal separately and neglect the relations between proposals, limiting their performance. In this work, we design a novel Content Temporal Relation Network (CTRNet) to generate temporal action proposals by exploring the content and temporal semantic relations between proposals simultaneously. Specifically, we design a proposal feature map generation layer to convert the temporal semantic relations of proposals into spatial relations. Based on the proposal feature map, we propose a content-temporal relation module, which applies a novel adaptive-dilated convolution to model the temporal semantic relations between proposals and designs a content-adaptive convolution operation to explore the content semantic relation between proposals. Considering the temporal and content semantic relations between proposals, CTRNet has learned discriminative proposal features to improve performance. Extensive experiments are performed on two mainstream temporal action detection datasets, and CTRNet significantly outperforms the previous state-of-the-art methods. The codes are available at https://github.com/YanZhang-bit/CTRNet.

KW - Proposal–proposal relations

KW - Temporal action detection

KW - Temporal action proposal generation

KW - Untrimmed video analysis

UR - http://www.scopus.com/inward/record.url?scp=85182893345&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2023.110245

DO - 10.1016/j.patcog.2023.110245

M3 - Article

AN - SCOPUS:85182893345

SN - 0031-3203

VL - 149

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 110245

ER -

Content Temporal Relation Network for temporal action proposal generation

摘要

访问文件

其它文件与链接

指纹

引用此