Robotic Grasp Detection With 6-D Pose Estimation Based on Graph Convolution and Refinement

Sheng Yu; Di Hua Zhai; Yuanqing Xia; Wei Wang; Chengyu Zhang; Shiqi Zhao

doi:10.1109/TSMC.2024.3371580

Robotic Grasp Detection With 6-D Pose Estimation Based on Graph Convolution and Refinement

Sheng Yu, Di Hua Zhai, Yuanqing Xia, Wei Wang, Chengyu Zhang, Shiqi Zhao

School of Automation

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Six-dimensional (6-D) object pose estimation plays a critical role in robotic grasp, which performs extensive usage in manufacturing. The current state-of-the-art pose estimation techniques primarily depend on matching keypoints. Typically, these methods establish a correspondence between 2-D keypoints in an image and the corresponding ones in a 3-D object model. And then they use the PnP-RANSAC algorithm to determine the 6-D pose of the object. However, this approach is not end-to-end trainable and may encounter difficulties when applied to scenarios necessitating differentiable poses. When employing a direct end-to-end regression method, the outcomes are often inferior. To tackle the mentioned problems, we present GR6D, which is a keypoint-and graph-convolution-based neural network for differentiable pose estimation based on RGB-D data. First, we propose a multiscale fusion method that utilizes convolution and graph convolution to exploit information contained in RGB and depth images. Additionally, we propose a transformer-based pose refinement module to further adjust features from RGB images and point clouds. We evaluate GR6D on three datasets: 1) LINEMOD; 2) occlusion LINEMOD; and 3) YCB-Video dataset, and it outperforms most state-of-the-art methods. Finally, we apply GR6D to pose estimation and the robotic grasping task in the real world, manifesting superior performance.

Original language	English
Pages (from-to)	1-13
Number of pages	13
Journal	IEEE Transactions on Systems, Man, and Cybernetics: Systems
DOIs	https://doi.org/10.1109/TSMC.2024.3371580
Publication status	Accepted/In press - 2024

Keywords

Convolution
Convolution network
Feature extraction
Point cloud compression
Pose estimation
Robot kinematics
Task analysis
Transformers
grasp detection
pose estimation
robot
transformer

Access to Document

10.1109/TSMC.2024.3371580

Cite this

@article{ef3739b3993240f08d33604085f3c560,

title = "Robotic Grasp Detection With 6-D Pose Estimation Based on Graph Convolution and Refinement",

abstract = "Six-dimensional (6-D) object pose estimation plays a critical role in robotic grasp, which performs extensive usage in manufacturing. The current state-of-the-art pose estimation techniques primarily depend on matching keypoints. Typically, these methods establish a correspondence between 2-D keypoints in an image and the corresponding ones in a 3-D object model. And then they use the PnP-RANSAC algorithm to determine the 6-D pose of the object. However, this approach is not end-to-end trainable and may encounter difficulties when applied to scenarios necessitating differentiable poses. When employing a direct end-to-end regression method, the outcomes are often inferior. To tackle the mentioned problems, we present GR6D, which is a keypoint-and graph-convolution-based neural network for differentiable pose estimation based on RGB-D data. First, we propose a multiscale fusion method that utilizes convolution and graph convolution to exploit information contained in RGB and depth images. Additionally, we propose a transformer-based pose refinement module to further adjust features from RGB images and point clouds. We evaluate GR6D on three datasets: 1) LINEMOD; 2) occlusion LINEMOD; and 3) YCB-Video dataset, and it outperforms most state-of-the-art methods. Finally, we apply GR6D to pose estimation and the robotic grasping task in the real world, manifesting superior performance.",

keywords = "Convolution, Convolution network, Feature extraction, Point cloud compression, Pose estimation, Robot kinematics, Task analysis, Transformers, grasp detection, pose estimation, robot, transformer",

author = "Sheng Yu and Zhai, {Di Hua} and Yuanqing Xia and Wei Wang and Chengyu Zhang and Shiqi Zhao",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TSMC.2024.3371580",

language = "English",

pages = "1--13",

journal = "IEEE Transactions on Systems, Man, and Cybernetics: Systems",

issn = "2168-2216",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Robotic Grasp Detection With 6-D Pose Estimation Based on Graph Convolution and Refinement

AU - Yu, Sheng

AU - Zhai, Di Hua

AU - Xia, Yuanqing

AU - Wang, Wei

AU - Zhang, Chengyu

AU - Zhao, Shiqi

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Six-dimensional (6-D) object pose estimation plays a critical role in robotic grasp, which performs extensive usage in manufacturing. The current state-of-the-art pose estimation techniques primarily depend on matching keypoints. Typically, these methods establish a correspondence between 2-D keypoints in an image and the corresponding ones in a 3-D object model. And then they use the PnP-RANSAC algorithm to determine the 6-D pose of the object. However, this approach is not end-to-end trainable and may encounter difficulties when applied to scenarios necessitating differentiable poses. When employing a direct end-to-end regression method, the outcomes are often inferior. To tackle the mentioned problems, we present GR6D, which is a keypoint-and graph-convolution-based neural network for differentiable pose estimation based on RGB-D data. First, we propose a multiscale fusion method that utilizes convolution and graph convolution to exploit information contained in RGB and depth images. Additionally, we propose a transformer-based pose refinement module to further adjust features from RGB images and point clouds. We evaluate GR6D on three datasets: 1) LINEMOD; 2) occlusion LINEMOD; and 3) YCB-Video dataset, and it outperforms most state-of-the-art methods. Finally, we apply GR6D to pose estimation and the robotic grasping task in the real world, manifesting superior performance.

AB - Six-dimensional (6-D) object pose estimation plays a critical role in robotic grasp, which performs extensive usage in manufacturing. The current state-of-the-art pose estimation techniques primarily depend on matching keypoints. Typically, these methods establish a correspondence between 2-D keypoints in an image and the corresponding ones in a 3-D object model. And then they use the PnP-RANSAC algorithm to determine the 6-D pose of the object. However, this approach is not end-to-end trainable and may encounter difficulties when applied to scenarios necessitating differentiable poses. When employing a direct end-to-end regression method, the outcomes are often inferior. To tackle the mentioned problems, we present GR6D, which is a keypoint-and graph-convolution-based neural network for differentiable pose estimation based on RGB-D data. First, we propose a multiscale fusion method that utilizes convolution and graph convolution to exploit information contained in RGB and depth images. Additionally, we propose a transformer-based pose refinement module to further adjust features from RGB images and point clouds. We evaluate GR6D on three datasets: 1) LINEMOD; 2) occlusion LINEMOD; and 3) YCB-Video dataset, and it outperforms most state-of-the-art methods. Finally, we apply GR6D to pose estimation and the robotic grasping task in the real world, manifesting superior performance.

KW - Convolution

KW - Convolution network

KW - Feature extraction

KW - Point cloud compression

KW - Pose estimation

KW - Robot kinematics

KW - Task analysis

KW - Transformers

KW - grasp detection

KW - pose estimation

KW - robot

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85188745878&partnerID=8YFLogxK

U2 - 10.1109/TSMC.2024.3371580

DO - 10.1109/TSMC.2024.3371580

M3 - Article

AN - SCOPUS:85188745878

SN - 2168-2216

SP - 1

EP - 13

JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems

JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems

ER -

Robotic Grasp Detection With 6-D Pose Estimation Based on Graph Convolution and Refinement

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this