Concept-Enhanced Relation Network for Video Visual Relation Inference

Qianwen Cao; Heyan Huang; Mucheng Ren; Changsen Yuan

doi:10.1109/TCSVT.2022.3220426

Concept-Enhanced Relation Network for Video Visual Relation Inference

Qianwen Cao, Heyan Huang^*, Mucheng Ren, Changsen Yuan

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Video visual relation inference aims at extracting the relation triplets in the form of < subject-predicate-object > in videos. With the development of deep learning, existing approaches are designed based on data-driven neural networks. But the datasets are always biased in terms of objects and relation triplets, which make relation inference challenging. Existing approaches often describe the relationships from visual, spatial, and semantic characteristics. The semantic description plays a key role to indicate the potential linguistic connections between objects, that are crucial to transfer knowledge across relationships, especially for the determination of novel relations. However, in these works, the semantic features are not emphasized, but simply obtained by mapping object labels, which can not reflect sufficient linguistic meanings. To alleviate the above issues, we propose a novel network, termed Concept-Enhanced Relation Network (CERN), to facilitate video visual relation inference. Thanks to the attributes and linguistic contexts implied in concepts, the semantic representations aggregated with related concept knowledge of objects are of benefit to relation inference. To this end, we incorporate retrieved concepts with local semantics of objects via the gating mechanism to generate the concept-enhanced semantic representations. Extensive experimental results show that our approach has achieved state-of-the-art performance on two public datasets: ImageNet-VidVRD and VidOR.

源语言	英语
页（从-至）	2233-2244
页数	12
期刊	IEEE Transactions on Circuits and Systems for Video Technology
卷	33
期	5
DOI	https://doi.org/10.1109/TCSVT.2022.3220426
出版状态	已出版 - 1 5月 2023

访问文件

10.1109/TCSVT.2022.3220426

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{640ac4080aff4d05a53b6fddd13b71b4,

title = "Concept-Enhanced Relation Network for Video Visual Relation Inference",

abstract = "Video visual relation inference aims at extracting the relation triplets in the form of < subject-predicate-object > in videos. With the development of deep learning, existing approaches are designed based on data-driven neural networks. But the datasets are always biased in terms of objects and relation triplets, which make relation inference challenging. Existing approaches often describe the relationships from visual, spatial, and semantic characteristics. The semantic description plays a key role to indicate the potential linguistic connections between objects, that are crucial to transfer knowledge across relationships, especially for the determination of novel relations. However, in these works, the semantic features are not emphasized, but simply obtained by mapping object labels, which can not reflect sufficient linguistic meanings. To alleviate the above issues, we propose a novel network, termed Concept-Enhanced Relation Network (CERN), to facilitate video visual relation inference. Thanks to the attributes and linguistic contexts implied in concepts, the semantic representations aggregated with related concept knowledge of objects are of benefit to relation inference. To this end, we incorporate retrieved concepts with local semantics of objects via the gating mechanism to generate the concept-enhanced semantic representations. Extensive experimental results show that our approach has achieved state-of-the-art performance on two public datasets: ImageNet-VidVRD and VidOR.",

keywords = "Video visual relation inference, concept knowledge base, feature learning, neural network, visual understanding",

author = "Qianwen Cao and Heyan Huang and Mucheng Ren and Changsen Yuan",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2023",

month = may,

day = "1",

doi = "10.1109/TCSVT.2022.3220426",

language = "English",

volume = "33",

pages = "2233--2244",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "5",

}

TY - JOUR

T1 - Concept-Enhanced Relation Network for Video Visual Relation Inference

AU - Cao, Qianwen

AU - Huang, Heyan

AU - Ren, Mucheng

AU - Yuan, Changsen

PY - 2023/5/1

Y1 - 2023/5/1

N2 - Video visual relation inference aims at extracting the relation triplets in the form of < subject-predicate-object > in videos. With the development of deep learning, existing approaches are designed based on data-driven neural networks. But the datasets are always biased in terms of objects and relation triplets, which make relation inference challenging. Existing approaches often describe the relationships from visual, spatial, and semantic characteristics. The semantic description plays a key role to indicate the potential linguistic connections between objects, that are crucial to transfer knowledge across relationships, especially for the determination of novel relations. However, in these works, the semantic features are not emphasized, but simply obtained by mapping object labels, which can not reflect sufficient linguistic meanings. To alleviate the above issues, we propose a novel network, termed Concept-Enhanced Relation Network (CERN), to facilitate video visual relation inference. Thanks to the attributes and linguistic contexts implied in concepts, the semantic representations aggregated with related concept knowledge of objects are of benefit to relation inference. To this end, we incorporate retrieved concepts with local semantics of objects via the gating mechanism to generate the concept-enhanced semantic representations. Extensive experimental results show that our approach has achieved state-of-the-art performance on two public datasets: ImageNet-VidVRD and VidOR.

AB - Video visual relation inference aims at extracting the relation triplets in the form of < subject-predicate-object > in videos. With the development of deep learning, existing approaches are designed based on data-driven neural networks. But the datasets are always biased in terms of objects and relation triplets, which make relation inference challenging. Existing approaches often describe the relationships from visual, spatial, and semantic characteristics. The semantic description plays a key role to indicate the potential linguistic connections between objects, that are crucial to transfer knowledge across relationships, especially for the determination of novel relations. However, in these works, the semantic features are not emphasized, but simply obtained by mapping object labels, which can not reflect sufficient linguistic meanings. To alleviate the above issues, we propose a novel network, termed Concept-Enhanced Relation Network (CERN), to facilitate video visual relation inference. Thanks to the attributes and linguistic contexts implied in concepts, the semantic representations aggregated with related concept knowledge of objects are of benefit to relation inference. To this end, we incorporate retrieved concepts with local semantics of objects via the gating mechanism to generate the concept-enhanced semantic representations. Extensive experimental results show that our approach has achieved state-of-the-art performance on two public datasets: ImageNet-VidVRD and VidOR.

KW - Video visual relation inference

KW - concept knowledge base

KW - feature learning

KW - neural network

KW - visual understanding

UR - http://www.scopus.com/inward/record.url?scp=85141560636&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2022.3220426

DO - 10.1109/TCSVT.2022.3220426

M3 - Article

AN - SCOPUS:85141560636

SN - 1051-8215

VL - 33

SP - 2233

EP - 2244

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 5

ER -

Concept-Enhanced Relation Network for Video Visual Relation Inference

摘要

访问文件

其它文件与链接

指纹

引用此