An Efficient Robotic Pushing and Grasping Method in Cluttered Scene

Sheng Yu; Di Hua Zhai; Yuanqing Xia; Yuyin Guan

doi:10.1109/TCYB.2024.3381639

An Efficient Robotic Pushing and Grasping Method in Cluttered Scene

Sheng Yu, Di Hua Zhai, Yuanqing Xia, Yuyin Guan

School of Automation

Research output: Contribution to journal › Article › peer-review

Abstract

Pushing and grasping (PG) are crucial skills for intelligent robots. These skills enable robots to perform complex grasping tasks in various scenarios. These PG methods can be categorized into single-stage and multistage approaches. Single-stage methods are faster but less accurate, while multistage methods offer high accuracy at the expense of time efficiency. To address this issue, a novel end-to-end PG method called efficient PG network (EPGNet) is proposed in this article. EPGNet achieves both high accuracy and efficiency simultaneously. To optimize performance with fewer parameters, EfficientNet-B0 is used as the backbone of EPGNet. Additionally, a novel cross-fusion module is introduced to enhance network performance in robotic PG tasks. This module fuses and utilizes local and global features, aiding the network in handling objects of varying sizes in different scenes. EPGNet consists of two branches dedicated to predicting PG actions, respectively. Both branches are trained simultaneously within a <inline-formula> <tex-math notation="LaTeX">$Q$</tex-math> </inline-formula>-learning framework. Training data is collected through trial and error, involving the robot performing PG actions. To bridge the gap between simulation and reality, a unique PG dataset is proposed. Additionally, a YOLACT network is trained on the PG dataset to facilitate object detection and segmentation. A comprehensive set of experiments is conducted in simulated environments and real-world scenarios. The results demonstrate that EPGNet outperforms single-stage methods and offers competitive performance compared to multistage methods, all while utilizing fewer parameters. A video is available at https://youtu.be/HNKJjQH0MPc.

Original language	English
Pages (from-to)	1-14
Number of pages	14
Journal	IEEE Transactions on Cybernetics
DOIs	https://doi.org/10.1109/TCYB.2024.3381639
Publication status	Accepted/In press - 2024

Keywords

Convolutional neural network
Grasping
Heating systems
Object recognition
Robot kinematics
Robots
Task analysis
Training
deep reinforcement learning (DRL)
grasping detection
robot

Access to Document

10.1109/TCYB.2024.3381639

Cite this

@article{77ea7fcabffb46bf8c38895b12ade173,

title = "An Efficient Robotic Pushing and Grasping Method in Cluttered Scene",

abstract = "Pushing and grasping (PG) are crucial skills for intelligent robots. These skills enable robots to perform complex grasping tasks in various scenarios. These PG methods can be categorized into single-stage and multistage approaches. Single-stage methods are faster but less accurate, while multistage methods offer high accuracy at the expense of time efficiency. To address this issue, a novel end-to-end PG method called efficient PG network (EPGNet) is proposed in this article. EPGNet achieves both high accuracy and efficiency simultaneously. To optimize performance with fewer parameters, EfficientNet-B0 is used as the backbone of EPGNet. Additionally, a novel cross-fusion module is introduced to enhance network performance in robotic PG tasks. This module fuses and utilizes local and global features, aiding the network in handling objects of varying sizes in different scenes. EPGNet consists of two branches dedicated to predicting PG actions, respectively. Both branches are trained simultaneously within a $Q$ -learning framework. Training data is collected through trial and error, involving the robot performing PG actions. To bridge the gap between simulation and reality, a unique PG dataset is proposed. Additionally, a YOLACT network is trained on the PG dataset to facilitate object detection and segmentation. A comprehensive set of experiments is conducted in simulated environments and real-world scenarios. The results demonstrate that EPGNet outperforms single-stage methods and offers competitive performance compared to multistage methods, all while utilizing fewer parameters. A video is available at https://youtu.be/HNKJjQH0MPc.",

keywords = "Convolutional neural network, Grasping, Heating systems, Object recognition, Robot kinematics, Robots, Task analysis, Training, deep reinforcement learning (DRL), grasping detection, robot",

author = "Sheng Yu and Zhai, {Di Hua} and Yuanqing Xia and Yuyin Guan",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TCYB.2024.3381639",

language = "English",

pages = "1--14",

journal = "IEEE Transactions on Cybernetics",

issn = "2168-2267",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - An Efficient Robotic Pushing and Grasping Method in Cluttered Scene

AU - Yu, Sheng

AU - Zhai, Di Hua

AU - Xia, Yuanqing

AU - Guan, Yuyin

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Pushing and grasping (PG) are crucial skills for intelligent robots. These skills enable robots to perform complex grasping tasks in various scenarios. These PG methods can be categorized into single-stage and multistage approaches. Single-stage methods are faster but less accurate, while multistage methods offer high accuracy at the expense of time efficiency. To address this issue, a novel end-to-end PG method called efficient PG network (EPGNet) is proposed in this article. EPGNet achieves both high accuracy and efficiency simultaneously. To optimize performance with fewer parameters, EfficientNet-B0 is used as the backbone of EPGNet. Additionally, a novel cross-fusion module is introduced to enhance network performance in robotic PG tasks. This module fuses and utilizes local and global features, aiding the network in handling objects of varying sizes in different scenes. EPGNet consists of two branches dedicated to predicting PG actions, respectively. Both branches are trained simultaneously within a $Q$ -learning framework. Training data is collected through trial and error, involving the robot performing PG actions. To bridge the gap between simulation and reality, a unique PG dataset is proposed. Additionally, a YOLACT network is trained on the PG dataset to facilitate object detection and segmentation. A comprehensive set of experiments is conducted in simulated environments and real-world scenarios. The results demonstrate that EPGNet outperforms single-stage methods and offers competitive performance compared to multistage methods, all while utilizing fewer parameters. A video is available at https://youtu.be/HNKJjQH0MPc.

AB - Pushing and grasping (PG) are crucial skills for intelligent robots. These skills enable robots to perform complex grasping tasks in various scenarios. These PG methods can be categorized into single-stage and multistage approaches. Single-stage methods are faster but less accurate, while multistage methods offer high accuracy at the expense of time efficiency. To address this issue, a novel end-to-end PG method called efficient PG network (EPGNet) is proposed in this article. EPGNet achieves both high accuracy and efficiency simultaneously. To optimize performance with fewer parameters, EfficientNet-B0 is used as the backbone of EPGNet. Additionally, a novel cross-fusion module is introduced to enhance network performance in robotic PG tasks. This module fuses and utilizes local and global features, aiding the network in handling objects of varying sizes in different scenes. EPGNet consists of two branches dedicated to predicting PG actions, respectively. Both branches are trained simultaneously within a $Q$ -learning framework. Training data is collected through trial and error, involving the robot performing PG actions. To bridge the gap between simulation and reality, a unique PG dataset is proposed. Additionally, a YOLACT network is trained on the PG dataset to facilitate object detection and segmentation. A comprehensive set of experiments is conducted in simulated environments and real-world scenarios. The results demonstrate that EPGNet outperforms single-stage methods and offers competitive performance compared to multistage methods, all while utilizing fewer parameters. A video is available at https://youtu.be/HNKJjQH0MPc.

KW - Convolutional neural network

KW - Grasping

KW - Heating systems

KW - Object recognition

KW - Robot kinematics

KW - Robots

KW - Task analysis

KW - Training

KW - deep reinforcement learning (DRL)

KW - grasping detection

KW - robot

UR - http://www.scopus.com/inward/record.url?scp=85190729543&partnerID=8YFLogxK

U2 - 10.1109/TCYB.2024.3381639

DO - 10.1109/TCYB.2024.3381639

M3 - Article

AN - SCOPUS:85190729543

SN - 2168-2267

SP - 1

EP - 14

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

ER -

An Efficient Robotic Pushing and Grasping Method in Cluttered Scene

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this