Adaptive depth-aware visual relationship detection

Ming Gang Gan; Yuxuan He

doi:10.1016/j.knosys.2022.108786

Adaptive depth-aware visual relationship detection

Ming Gang Gan, Yuxuan He^*

^*此作品的通讯作者

自动化学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

7 引用（Scopus）

摘要

Visual relationship detection aims at detecting the interaction between objects from flat image, where visual appearance and spatial relationship between different objects are two key factors for detection. However, most existing methods usually extract 2D information of object from flat images, which lacks depth information compared to actual 3D space. To obtain and utilize depth information for visual relationship detection, we construct Depth VRDs dataset as an extension of the VRD dataset and propose adaptive depth-aware visual relationship detection network(ADVRD). In terms of visual appearance, we propose depth-aware visual fusion module to use additional depth visual information to guide RGB visual information where needs to be strengthened. In terms of spatial relationship, to generate a more accurate depth representation when locating object depth spatial position, we propose adaptive depth spatial location method which uses regional information variance to measure information relevance in each small region in object bounding box. Experiment results show that depth information can significantly improve the performance of our network on visual relationship detection tasks, especially for zero shots.

源语言	英语
文章编号	108786
期刊	Knowledge-Based Systems
卷	247
DOI	https://doi.org/10.1016/j.knosys.2022.108786
出版状态	已出版 - 8 7月 2022

访问文件

10.1016/j.knosys.2022.108786

其它文件与链接

链接到 Scopus 的出版物

引用此

Gan, M. G., & He, Y. (2022). Adaptive depth-aware visual relationship detection. Knowledge-Based Systems, 247, 文章 108786. https://doi.org/10.1016/j.knosys.2022.108786

@article{ed34e999ec094fea978ad6b6e27b4ae0,

title = "Adaptive depth-aware visual relationship detection",

abstract = "Visual relationship detection aims at detecting the interaction between objects from flat image, where visual appearance and spatial relationship between different objects are two key factors for detection. However, most existing methods usually extract 2D information of object from flat images, which lacks depth information compared to actual 3D space. To obtain and utilize depth information for visual relationship detection, we construct Depth VRDs dataset as an extension of the VRD dataset and propose adaptive depth-aware visual relationship detection network(ADVRD). In terms of visual appearance, we propose depth-aware visual fusion module to use additional depth visual information to guide RGB visual information where needs to be strengthened. In terms of spatial relationship, to generate a more accurate depth representation when locating object depth spatial position, we propose adaptive depth spatial location method which uses regional information variance to measure information relevance in each small region in object bounding box. Experiment results show that depth information can significantly improve the performance of our network on visual relationship detection tasks, especially for zero shots.",

keywords = "Adaptive depth spatial location, Depth-aware visual fusion, Estimated depth maps, Visual relationship detection",

author = "Gan, {Ming Gang} and Yuxuan He",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier B.V.",

year = "2022",

month = jul,

day = "8",

doi = "10.1016/j.knosys.2022.108786",

language = "English",

volume = "247",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Adaptive depth-aware visual relationship detection

AU - Gan, Ming Gang

AU - He, Yuxuan

PY - 2022/7/8

Y1 - 2022/7/8

N2 - Visual relationship detection aims at detecting the interaction between objects from flat image, where visual appearance and spatial relationship between different objects are two key factors for detection. However, most existing methods usually extract 2D information of object from flat images, which lacks depth information compared to actual 3D space. To obtain and utilize depth information for visual relationship detection, we construct Depth VRDs dataset as an extension of the VRD dataset and propose adaptive depth-aware visual relationship detection network(ADVRD). In terms of visual appearance, we propose depth-aware visual fusion module to use additional depth visual information to guide RGB visual information where needs to be strengthened. In terms of spatial relationship, to generate a more accurate depth representation when locating object depth spatial position, we propose adaptive depth spatial location method which uses regional information variance to measure information relevance in each small region in object bounding box. Experiment results show that depth information can significantly improve the performance of our network on visual relationship detection tasks, especially for zero shots.

AB - Visual relationship detection aims at detecting the interaction between objects from flat image, where visual appearance and spatial relationship between different objects are two key factors for detection. However, most existing methods usually extract 2D information of object from flat images, which lacks depth information compared to actual 3D space. To obtain and utilize depth information for visual relationship detection, we construct Depth VRDs dataset as an extension of the VRD dataset and propose adaptive depth-aware visual relationship detection network(ADVRD). In terms of visual appearance, we propose depth-aware visual fusion module to use additional depth visual information to guide RGB visual information where needs to be strengthened. In terms of spatial relationship, to generate a more accurate depth representation when locating object depth spatial position, we propose adaptive depth spatial location method which uses regional information variance to measure information relevance in each small region in object bounding box. Experiment results show that depth information can significantly improve the performance of our network on visual relationship detection tasks, especially for zero shots.

KW - Adaptive depth spatial location

KW - Depth-aware visual fusion

KW - Estimated depth maps

KW - Visual relationship detection

UR - http://www.scopus.com/inward/record.url?scp=85129566812&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2022.108786

DO - 10.1016/j.knosys.2022.108786

M3 - Article

AN - SCOPUS:85129566812

SN - 0950-7051

VL - 247

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 108786

ER -

Adaptive depth-aware visual relationship detection

摘要

访问文件

其它文件与链接

指纹

引用此