TY - JOUR
T1 - Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation
AU - Jiang, Zhihong
AU - Wang, Xin
AU - Huang, Xiao
AU - Li, Hui
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/4
Y1 - 2021/4
N2 - Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D keypoint detection and translation refinement. In the first stage, the 2D keypoint regression space is constrained by triangulate geometric feature vectors, and the low-quality prediction is suppressed by the center-heatmap weighted loss function, thereby the performance of keypoint detection is significantly improved. In the second stage, the Visual-Flow Fusion network (VFFNet) is used to extract the visual feature and optical flow feature of the rendered image and the observed image, and to predict the relative translation based on the difference of features. Specifically, the VFFNet is trained iteratively to gain the ability to predict the relative translation deviation. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TGCPose6D method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on several benchmarks.
AB - Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D keypoint detection and translation refinement. In the first stage, the 2D keypoint regression space is constrained by triangulate geometric feature vectors, and the low-quality prediction is suppressed by the center-heatmap weighted loss function, thereby the performance of keypoint detection is significantly improved. In the second stage, the Visual-Flow Fusion network (VFFNet) is used to extract the visual feature and optical flow feature of the rendered image and the observed image, and to predict the relative translation based on the difference of features. Specifically, the VFFNet is trained iteratively to gain the ability to predict the relative translation deviation. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TGCPose6D method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on several benchmarks.
KW - 6D object pose estimation
KW - Iterative translation refinement
KW - Triangulate geometric constraint
KW - Visual-flow feature fusion
UR - http://www.scopus.com/inward/record.url?scp=85101566575&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2021.104127
DO - 10.1016/j.imavis.2021.104127
M3 - Article
AN - SCOPUS:85101566575
SN - 0262-8856
VL - 108
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 104127
ER -