A Novel Robotic Pushing and Grasping Method Based on Vision Transformer and Convolution

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

Robotic grasping techniques have been widely studied in recent years. However, it is always a challenging problem for robots to grasp in cluttered scenes. In this issue, objects are placed close to each other, and there is no space around for the robot to place the gripper, making it difficult to find a suitable grasping position. To solve this problem, this article proposes to use the combination of pushing and grasping (PG) actions to help grasp pose detection and robot grasping. We propose a pushing–grasping combined grasping network (GN), PG method based on transformer and convolution (PGTC). For the pushing action, we propose a vision transformer (ViT)-based object position prediction network pushing transformer network (PTNet), which can well capture the global and temporal features and can better predict the position of objects after pushing. To perform the grasping detection, we propose a cross dense fusion network (CDFNet), which can make full use of the RGB image and depth image, and fuse and refine them several times. Compared with previous networks, CDFNet is able to detect the optimal grasping position more accurately. Finally, we use the network for both simulation and actual UR3 robot grasping experiments and achieve SOTA performance. Video and dataset are available at https://youtu.be/Q58YE-Cc250.

Original languageEnglish
Pages (from-to)1-14
Number of pages14
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
Publication statusAccepted/In press - 2023

Keywords

  • Convolutional neural network
  • grasping detection
  • robot
  • transformer

Fingerprint

Dive into the research topics of 'A Novel Robotic Pushing and Grasping Method Based on Vision Transformer and Convolution'. Together they form a unique fingerprint.

Cite this