TY - JOUR
T1 - A Novel Robotic Pushing and Grasping Method Based on Vision Transformer and Convolution
AU - Yu, Sheng
AU - Zhai, Di Hua
AU - Xia, Yuanqing
N1 - Publisher Copyright:
IEEE
PY - 2023
Y1 - 2023
N2 - Robotic grasping techniques have been widely studied in recent years. However, it is always a challenging problem for robots to grasp in cluttered scenes. In this issue, objects are placed close to each other, and there is no space around for the robot to place the gripper, making it difficult to find a suitable grasping position. To solve this problem, this article proposes to use the combination of pushing and grasping (PG) actions to help grasp pose detection and robot grasping. We propose a pushing–grasping combined grasping network (GN), PG method based on transformer and convolution (PGTC). For the pushing action, we propose a vision transformer (ViT)-based object position prediction network pushing transformer network (PTNet), which can well capture the global and temporal features and can better predict the position of objects after pushing. To perform the grasping detection, we propose a cross dense fusion network (CDFNet), which can make full use of the RGB image and depth image, and fuse and refine them several times. Compared with previous networks, CDFNet is able to detect the optimal grasping position more accurately. Finally, we use the network for both simulation and actual UR3 robot grasping experiments and achieve SOTA performance. Video and dataset are available at https://youtu.be/Q58YE-Cc250.
AB - Robotic grasping techniques have been widely studied in recent years. However, it is always a challenging problem for robots to grasp in cluttered scenes. In this issue, objects are placed close to each other, and there is no space around for the robot to place the gripper, making it difficult to find a suitable grasping position. To solve this problem, this article proposes to use the combination of pushing and grasping (PG) actions to help grasp pose detection and robot grasping. We propose a pushing–grasping combined grasping network (GN), PG method based on transformer and convolution (PGTC). For the pushing action, we propose a vision transformer (ViT)-based object position prediction network pushing transformer network (PTNet), which can well capture the global and temporal features and can better predict the position of objects after pushing. To perform the grasping detection, we propose a cross dense fusion network (CDFNet), which can make full use of the RGB image and depth image, and fuse and refine them several times. Compared with previous networks, CDFNet is able to detect the optimal grasping position more accurately. Finally, we use the network for both simulation and actual UR3 robot grasping experiments and achieve SOTA performance. Video and dataset are available at https://youtu.be/Q58YE-Cc250.
KW - Convolutional neural network
KW - grasping detection
KW - robot
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85149892329&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2023.3244186
DO - 10.1109/TNNLS.2023.3244186
M3 - Article
AN - SCOPUS:85149892329
SN - 2162-237X
SP - 1
EP - 14
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -