RDBN: Visual relationship detection with inaccurate RGB-D images

Xiaozhou Liu; Ming Gang Gan

doi:10.1016/j.knosys.2020.106142

RDBN: Visual relationship detection with inaccurate RGB-D images

Xiaozhou Liu, Ming Gang Gan^*

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

Traditional visual relationship detection methods only use RGB information to train the semantic network, which do not match human habits that we combine RGB information with Depth information to perceive the world, thus, there is not enough generalization ability (zero-shot performance) to extract the visual relationships in practical scenes. To solve this problem, a novel visual relationship detection framework based on RGB-D images is proposed in this paper. Since it is difficult to get accurate depth maps from complex scenes, we propose a fuzzy strategy based method to represent Depth features of inaccurate depth maps which are independent of manual depth annotations. In particular, we formulate the RGB-Depth-Balanced-Network (RDBN) which can simultaneously process RGB features and the corresponding estimated depth maps to counter the inaccuracy of depth maps and extract semantic information by the only input of monocular RGB images. In experiments, we conduct ablation experiments to analyze functions of different visual components to demonstrate the effectiveness of our RDBN. Furthermore, we show that RDBN outperforms state-of-the-art visual relationship detection methods on Visual Relationship Dataset (VRD) and UnRel Dataset when tackling the visual relationship detection task of zero-shot learning in specific depth conditions, and the task of image retrieval among unusual relationships.

Original language	English
Article number	106142
Journal	Knowledge-Based Systems
Volume	204
DOIs	https://doi.org/10.1016/j.knosys.2020.106142
Publication status	Published - 27 Sept 2020

Keywords

Deep neural network
RGB-D image
Visual relationship detection
Visual scene understanding
Zero-shot learning

Access to Document

10.1016/j.knosys.2020.106142

Cite this

Liu, X., & Gan, M. G. (2020). RDBN: Visual relationship detection with inaccurate RGB-D images. Knowledge-Based Systems, 204, Article 106142. https://doi.org/10.1016/j.knosys.2020.106142

@article{a43e1e51b23d4ecfaf9bc5faf37fea54,

title = "RDBN: Visual relationship detection with inaccurate RGB-D images",

abstract = "Traditional visual relationship detection methods only use RGB information to train the semantic network, which do not match human habits that we combine RGB information with Depth information to perceive the world, thus, there is not enough generalization ability (zero-shot performance) to extract the visual relationships in practical scenes. To solve this problem, a novel visual relationship detection framework based on RGB-D images is proposed in this paper. Since it is difficult to get accurate depth maps from complex scenes, we propose a fuzzy strategy based method to represent Depth features of inaccurate depth maps which are independent of manual depth annotations. In particular, we formulate the RGB-Depth-Balanced-Network (RDBN) which can simultaneously process RGB features and the corresponding estimated depth maps to counter the inaccuracy of depth maps and extract semantic information by the only input of monocular RGB images. In experiments, we conduct ablation experiments to analyze functions of different visual components to demonstrate the effectiveness of our RDBN. Furthermore, we show that RDBN outperforms state-of-the-art visual relationship detection methods on Visual Relationship Dataset (VRD) and UnRel Dataset when tackling the visual relationship detection task of zero-shot learning in specific depth conditions, and the task of image retrieval among unusual relationships.",

keywords = "Deep neural network, RGB-D image, Visual relationship detection, Visual scene understanding, Zero-shot learning",

author = "Xiaozhou Liu and Gan, {Ming Gang}",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = sep,

day = "27",

doi = "10.1016/j.knosys.2020.106142",

language = "English",

volume = "204",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - RDBN

T2 - Visual relationship detection with inaccurate RGB-D images

AU - Liu, Xiaozhou

AU - Gan, Ming Gang

PY - 2020/9/27

Y1 - 2020/9/27

N2 - Traditional visual relationship detection methods only use RGB information to train the semantic network, which do not match human habits that we combine RGB information with Depth information to perceive the world, thus, there is not enough generalization ability (zero-shot performance) to extract the visual relationships in practical scenes. To solve this problem, a novel visual relationship detection framework based on RGB-D images is proposed in this paper. Since it is difficult to get accurate depth maps from complex scenes, we propose a fuzzy strategy based method to represent Depth features of inaccurate depth maps which are independent of manual depth annotations. In particular, we formulate the RGB-Depth-Balanced-Network (RDBN) which can simultaneously process RGB features and the corresponding estimated depth maps to counter the inaccuracy of depth maps and extract semantic information by the only input of monocular RGB images. In experiments, we conduct ablation experiments to analyze functions of different visual components to demonstrate the effectiveness of our RDBN. Furthermore, we show that RDBN outperforms state-of-the-art visual relationship detection methods on Visual Relationship Dataset (VRD) and UnRel Dataset when tackling the visual relationship detection task of zero-shot learning in specific depth conditions, and the task of image retrieval among unusual relationships.

AB - Traditional visual relationship detection methods only use RGB information to train the semantic network, which do not match human habits that we combine RGB information with Depth information to perceive the world, thus, there is not enough generalization ability (zero-shot performance) to extract the visual relationships in practical scenes. To solve this problem, a novel visual relationship detection framework based on RGB-D images is proposed in this paper. Since it is difficult to get accurate depth maps from complex scenes, we propose a fuzzy strategy based method to represent Depth features of inaccurate depth maps which are independent of manual depth annotations. In particular, we formulate the RGB-Depth-Balanced-Network (RDBN) which can simultaneously process RGB features and the corresponding estimated depth maps to counter the inaccuracy of depth maps and extract semantic information by the only input of monocular RGB images. In experiments, we conduct ablation experiments to analyze functions of different visual components to demonstrate the effectiveness of our RDBN. Furthermore, we show that RDBN outperforms state-of-the-art visual relationship detection methods on Visual Relationship Dataset (VRD) and UnRel Dataset when tackling the visual relationship detection task of zero-shot learning in specific depth conditions, and the task of image retrieval among unusual relationships.

KW - Deep neural network

KW - RGB-D image

KW - Visual relationship detection

KW - Visual scene understanding

KW - Zero-shot learning

UR - http://www.scopus.com/inward/record.url?scp=85086896550&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2020.106142

DO - 10.1016/j.knosys.2020.106142

M3 - Article

AN - SCOPUS:85086896550

SN - 0950-7051

VL - 204

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 106142

ER -

RDBN: Visual relationship detection with inaccurate RGB-D images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this