Multi-View Visual Relationship Detection with Estimated Depth Map

Xiaozhou Liu; Ming Gang Gan; Yuxuan He

doi:10.3390/app12094674

Multi-View Visual Relationship Detection with Estimated Depth Map

Xiaozhou Liu, Ming Gang Gan^*, Yuxuan He

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

The abundant visual information contained in multi-view images is widely used in computer vision tasks. Existing visual relationship detection frameworks have extended the feature vector to improve model performance. However, single view information can not fully reveal the visual relationships in complex visual scenes. To solve this problem and explore the multi-view information in a visual relationship detection (VRD) model, a novel multi-view VRD framework based on a monocular RGB image and an estimated depth map is proposed. The contributions of this paper are threefold. First, we construct a novel multi-view framework which fuses information of different views extracted from estimated RGB-D images. Second, a multi-view image generation method is proposed to transfer flat visual space to 3D multi-view space. Third, we redesign the visual relationship balanced classifier which can process multi-view feature vectors simultaneously. Detailed experiments were conducted on two datasets to demonstrate the effectiveness of the multi-view VRD framework. The experimental results showed that the multi-view VRD framework resulted in state-of-the-art zero-shot learning performance in specific depth conditions.

Original language	English
Article number	4674
Journal	Applied Sciences (Switzerland)
Volume	12
Issue number	9
DOIs	https://doi.org/10.3390/app12094674
Publication status	Published - 1 May 2022

Keywords

RGB-D image
computer vision
depth map
multi view
visual relationship detection

Access to Document

10.3390/app12094674

Cite this

Liu, X., Gan, M. G., & He, Y. (2022). Multi-View Visual Relationship Detection with Estimated Depth Map. Applied Sciences (Switzerland), 12(9), Article 4674. https://doi.org/10.3390/app12094674

@article{44f2c245ebde45f3a04c367f123bbbd7,

title = "Multi-View Visual Relationship Detection with Estimated Depth Map",

abstract = "The abundant visual information contained in multi-view images is widely used in computer vision tasks. Existing visual relationship detection frameworks have extended the feature vector to improve model performance. However, single view information can not fully reveal the visual relationships in complex visual scenes. To solve this problem and explore the multi-view information in a visual relationship detection (VRD) model, a novel multi-view VRD framework based on a monocular RGB image and an estimated depth map is proposed. The contributions of this paper are threefold. First, we construct a novel multi-view framework which fuses information of different views extracted from estimated RGB-D images. Second, a multi-view image generation method is proposed to transfer flat visual space to 3D multi-view space. Third, we redesign the visual relationship balanced classifier which can process multi-view feature vectors simultaneously. Detailed experiments were conducted on two datasets to demonstrate the effectiveness of the multi-view VRD framework. The experimental results showed that the multi-view VRD framework resulted in state-of-the-art zero-shot learning performance in specific depth conditions.",

keywords = "RGB-D image, computer vision, depth map, multi view, visual relationship detection",

author = "Xiaozhou Liu and Gan, {Ming Gang} and Yuxuan He",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = may,

day = "1",

doi = "10.3390/app12094674",

language = "English",

volume = "12",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "9",

}

TY - JOUR

T1 - Multi-View Visual Relationship Detection with Estimated Depth Map

AU - Liu, Xiaozhou

AU - Gan, Ming Gang

AU - He, Yuxuan

PY - 2022/5/1

Y1 - 2022/5/1

N2 - The abundant visual information contained in multi-view images is widely used in computer vision tasks. Existing visual relationship detection frameworks have extended the feature vector to improve model performance. However, single view information can not fully reveal the visual relationships in complex visual scenes. To solve this problem and explore the multi-view information in a visual relationship detection (VRD) model, a novel multi-view VRD framework based on a monocular RGB image and an estimated depth map is proposed. The contributions of this paper are threefold. First, we construct a novel multi-view framework which fuses information of different views extracted from estimated RGB-D images. Second, a multi-view image generation method is proposed to transfer flat visual space to 3D multi-view space. Third, we redesign the visual relationship balanced classifier which can process multi-view feature vectors simultaneously. Detailed experiments were conducted on two datasets to demonstrate the effectiveness of the multi-view VRD framework. The experimental results showed that the multi-view VRD framework resulted in state-of-the-art zero-shot learning performance in specific depth conditions.

AB - The abundant visual information contained in multi-view images is widely used in computer vision tasks. Existing visual relationship detection frameworks have extended the feature vector to improve model performance. However, single view information can not fully reveal the visual relationships in complex visual scenes. To solve this problem and explore the multi-view information in a visual relationship detection (VRD) model, a novel multi-view VRD framework based on a monocular RGB image and an estimated depth map is proposed. The contributions of this paper are threefold. First, we construct a novel multi-view framework which fuses information of different views extracted from estimated RGB-D images. Second, a multi-view image generation method is proposed to transfer flat visual space to 3D multi-view space. Third, we redesign the visual relationship balanced classifier which can process multi-view feature vectors simultaneously. Detailed experiments were conducted on two datasets to demonstrate the effectiveness of the multi-view VRD framework. The experimental results showed that the multi-view VRD framework resulted in state-of-the-art zero-shot learning performance in specific depth conditions.

KW - RGB-D image

KW - computer vision

KW - depth map

KW - multi view

KW - visual relationship detection

UR - http://www.scopus.com/inward/record.url?scp=85130141828&partnerID=8YFLogxK

U2 - 10.3390/app12094674

DO - 10.3390/app12094674

M3 - Article

AN - SCOPUS:85130141828

SN - 2076-3417

VL - 12

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 9

M1 - 4674

ER -

Multi-View Visual Relationship Detection with Estimated Depth Map

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this