Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation

Zhihong Jiang; Xin Wang; Xiao Huang; Hui Li

doi:10.1016/j.imavis.2021.104127

Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation

Zhihong Jiang, Xin Wang, Xiao Huang^*, Hui Li

^*此作品的通讯作者

Advanced Innovation Center for Intelligent Robots and Systems

科研成果: 期刊稿件 › 文章 › 同行评审

10 引用（Scopus）

摘要

Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D keypoint detection and translation refinement. In the first stage, the 2D keypoint regression space is constrained by triangulate geometric feature vectors, and the low-quality prediction is suppressed by the center-heatmap weighted loss function, thereby the performance of keypoint detection is significantly improved. In the second stage, the Visual-Flow Fusion network (VFFNet) is used to extract the visual feature and optical flow feature of the rendered image and the observed image, and to predict the relative translation based on the difference of features. Specifically, the VFFNet is trained iteratively to gain the ability to predict the relative translation deviation. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TGCPose6D method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on several benchmarks.

源语言	英语
文章编号	104127
期刊	Image and Vision Computing
卷	108
DOI	https://doi.org/10.1016/j.imavis.2021.104127
出版状态	已出版 - 4月 2021

访问文件

10.1016/j.imavis.2021.104127

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{1d655da135df4e89b2b5ad93dc1052ab,

title = "Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation",

abstract = "Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D keypoint detection and translation refinement. In the first stage, the 2D keypoint regression space is constrained by triangulate geometric feature vectors, and the low-quality prediction is suppressed by the center-heatmap weighted loss function, thereby the performance of keypoint detection is significantly improved. In the second stage, the Visual-Flow Fusion network (VFFNet) is used to extract the visual feature and optical flow feature of the rendered image and the observed image, and to predict the relative translation based on the difference of features. Specifically, the VFFNet is trained iteratively to gain the ability to predict the relative translation deviation. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TGCPose6D method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on several benchmarks.",

keywords = "6D object pose estimation, Iterative translation refinement, Triangulate geometric constraint, Visual-flow feature fusion",

author = "Zhihong Jiang and Xin Wang and Xiao Huang and Hui Li",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = apr,

doi = "10.1016/j.imavis.2021.104127",

language = "English",

volume = "108",

journal = "Image and Vision Computing",

issn = "0262-8856",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation

AU - Jiang, Zhihong

AU - Wang, Xin

AU - Huang, Xiao

AU - Li, Hui

PY - 2021/4

Y1 - 2021/4

N2 - Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D keypoint detection and translation refinement. In the first stage, the 2D keypoint regression space is constrained by triangulate geometric feature vectors, and the low-quality prediction is suppressed by the center-heatmap weighted loss function, thereby the performance of keypoint detection is significantly improved. In the second stage, the Visual-Flow Fusion network (VFFNet) is used to extract the visual feature and optical flow feature of the rendered image and the observed image, and to predict the relative translation based on the difference of features. Specifically, the VFFNet is trained iteratively to gain the ability to predict the relative translation deviation. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TGCPose6D method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on several benchmarks.

AB - Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D keypoint detection and translation refinement. In the first stage, the 2D keypoint regression space is constrained by triangulate geometric feature vectors, and the low-quality prediction is suppressed by the center-heatmap weighted loss function, thereby the performance of keypoint detection is significantly improved. In the second stage, the Visual-Flow Fusion network (VFFNet) is used to extract the visual feature and optical flow feature of the rendered image and the observed image, and to predict the relative translation based on the difference of features. Specifically, the VFFNet is trained iteratively to gain the ability to predict the relative translation deviation. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TGCPose6D method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on several benchmarks.

KW - 6D object pose estimation

KW - Iterative translation refinement

KW - Triangulate geometric constraint

KW - Visual-flow feature fusion

UR - http://www.scopus.com/inward/record.url?scp=85101566575&partnerID=8YFLogxK

U2 - 10.1016/j.imavis.2021.104127

DO - 10.1016/j.imavis.2021.104127

M3 - Article

AN - SCOPUS:85101566575

SN - 0262-8856

VL - 108

JO - Image and Vision Computing

JF - Image and Vision Computing

M1 - 104127

ER -

Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation

摘要

访问文件

其它文件与链接

指纹

引用此