TY - JOUR
T1 - RDBN
T2 - Visual relationship detection with inaccurate RGB-D images
AU - Liu, Xiaozhou
AU - Gan, Ming Gang
N1 - Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/9/27
Y1 - 2020/9/27
N2 - Traditional visual relationship detection methods only use RGB information to train the semantic network, which do not match human habits that we combine RGB information with Depth information to perceive the world, thus, there is not enough generalization ability (zero-shot performance) to extract the visual relationships in practical scenes. To solve this problem, a novel visual relationship detection framework based on RGB-D images is proposed in this paper. Since it is difficult to get accurate depth maps from complex scenes, we propose a fuzzy strategy based method to represent Depth features of inaccurate depth maps which are independent of manual depth annotations. In particular, we formulate the RGB-Depth-Balanced-Network (RDBN) which can simultaneously process RGB features and the corresponding estimated depth maps to counter the inaccuracy of depth maps and extract semantic information by the only input of monocular RGB images. In experiments, we conduct ablation experiments to analyze functions of different visual components to demonstrate the effectiveness of our RDBN. Furthermore, we show that RDBN outperforms state-of-the-art visual relationship detection methods on Visual Relationship Dataset (VRD) and UnRel Dataset when tackling the visual relationship detection task of zero-shot learning in specific depth conditions, and the task of image retrieval among unusual relationships.
AB - Traditional visual relationship detection methods only use RGB information to train the semantic network, which do not match human habits that we combine RGB information with Depth information to perceive the world, thus, there is not enough generalization ability (zero-shot performance) to extract the visual relationships in practical scenes. To solve this problem, a novel visual relationship detection framework based on RGB-D images is proposed in this paper. Since it is difficult to get accurate depth maps from complex scenes, we propose a fuzzy strategy based method to represent Depth features of inaccurate depth maps which are independent of manual depth annotations. In particular, we formulate the RGB-Depth-Balanced-Network (RDBN) which can simultaneously process RGB features and the corresponding estimated depth maps to counter the inaccuracy of depth maps and extract semantic information by the only input of monocular RGB images. In experiments, we conduct ablation experiments to analyze functions of different visual components to demonstrate the effectiveness of our RDBN. Furthermore, we show that RDBN outperforms state-of-the-art visual relationship detection methods on Visual Relationship Dataset (VRD) and UnRel Dataset when tackling the visual relationship detection task of zero-shot learning in specific depth conditions, and the task of image retrieval among unusual relationships.
KW - Deep neural network
KW - RGB-D image
KW - Visual relationship detection
KW - Visual scene understanding
KW - Zero-shot learning
UR - http://www.scopus.com/inward/record.url?scp=85086896550&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2020.106142
DO - 10.1016/j.knosys.2020.106142
M3 - Article
AN - SCOPUS:85086896550
SN - 0950-7051
VL - 204
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 106142
ER -