TY - JOUR
T1 - Adaptive depth-aware visual relationship detection
AU - Gan, Ming Gang
AU - He, Yuxuan
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/7/8
Y1 - 2022/7/8
N2 - Visual relationship detection aims at detecting the interaction between objects from flat image, where visual appearance and spatial relationship between different objects are two key factors for detection. However, most existing methods usually extract 2D information of object from flat images, which lacks depth information compared to actual 3D space. To obtain and utilize depth information for visual relationship detection, we construct Depth VRDs dataset as an extension of the VRD dataset and propose adaptive depth-aware visual relationship detection network(ADVRD). In terms of visual appearance, we propose depth-aware visual fusion module to use additional depth visual information to guide RGB visual information where needs to be strengthened. In terms of spatial relationship, to generate a more accurate depth representation when locating object depth spatial position, we propose adaptive depth spatial location method which uses regional information variance to measure information relevance in each small region in object bounding box. Experiment results show that depth information can significantly improve the performance of our network on visual relationship detection tasks, especially for zero shots.
AB - Visual relationship detection aims at detecting the interaction between objects from flat image, where visual appearance and spatial relationship between different objects are two key factors for detection. However, most existing methods usually extract 2D information of object from flat images, which lacks depth information compared to actual 3D space. To obtain and utilize depth information for visual relationship detection, we construct Depth VRDs dataset as an extension of the VRD dataset and propose adaptive depth-aware visual relationship detection network(ADVRD). In terms of visual appearance, we propose depth-aware visual fusion module to use additional depth visual information to guide RGB visual information where needs to be strengthened. In terms of spatial relationship, to generate a more accurate depth representation when locating object depth spatial position, we propose adaptive depth spatial location method which uses regional information variance to measure information relevance in each small region in object bounding box. Experiment results show that depth information can significantly improve the performance of our network on visual relationship detection tasks, especially for zero shots.
KW - Adaptive depth spatial location
KW - Depth-aware visual fusion
KW - Estimated depth maps
KW - Visual relationship detection
UR - http://www.scopus.com/inward/record.url?scp=85129566812&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2022.108786
DO - 10.1016/j.knosys.2022.108786
M3 - Article
AN - SCOPUS:85129566812
SN - 0950-7051
VL - 247
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 108786
ER -