FANet: Fast and Accurate Robotic Grasp Detection Based on Keypoints

Di Hua Zhai; Sheng Yu; Yuanqing Xia

doi:10.1109/TASE.2023.3272664

FANet: Fast and Accurate Robotic Grasp Detection Based on Keypoints

Di Hua Zhai, Sheng Yu, Yuanqing Xia

School of Automation

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

In practice, the real-time and accuracy of robotic grasp detection are two very important metrics. In the past, researchers had to sacrifice the real-time nature of the detection network in order to obtain higher detection accuracy. How to make the real-time and accuracy of the network co-exist is a problem worth studying. In order to solve this problem, this paper proposes a network, FANet, based on grasp keypoints, which improves the accuracy of grasp detection while ensuring the real-time performance. The key of this paper is how to quickly and accurately detect grasped keypoints. To this end, this paper proposes a local refinement module that optimizes and de-duplicates each feature of the multi-scale feature map, enabling the network to make full use of the multi-scale features. We also propose a global feature refinement module that allows the network to make better use of global features. We also propose a grasp keypoint optimization module that predicts the offset between the actual keypoints and the predicted keypoints, enabling the network to predict the keypoints more accurately. Moreover, we develop two FANets specifically for grasp detection on CPU and GPU, both of which can accomplish real-time grasp detection in real-world scenes. We complete the training and testing of FANet on the Cornell dataset and the Jacquard dataset, achieving SOTA results on the Jacquard dataset. We also test FANet on a dataset of unknown objects, all with good results. Finally, we use the FANet in grasping experiments with an actual Baxter robot and achieve an average grasping success rate of 96%. <italic>Note to Practitioners</italic>—Real-time and accuracy are two very important metrics in robotic grasping detection. To achieve high accuracy, more time is often consumed for feature extraction. Similarly, in order to improve the real time performance, we need to reduce the time consumed in the feature extraction process, which may result in a drop in detection accuracy. How to coordinate the relationship between them, so as to have both, is a problem worth investigating. Current methods tend to focus on obtaining higher accuracy and they are willing to spend more time to achieve higher accuracy. But in some practical scenarios, such as on factory assembly lines, objects move fast, and the network needs to be able to detect the grasping position quickly, the real-time performance is more important, which makes some methods difficult to use. In addition, most of the current methods tend to focus on GPU-based robotic grasp detection methods, and in real-world scenarios we may not have such a powerful processing GPU available. In contrast, the CPU is an indispensable unit of the computer that we can use to process images without a high-performance GPU. However, compared to GPUs, the CPUs’ image processing capability is poor, making it difficult to achieve real-time processing. Faced with this situation, the problem of how to achieve real-time and high accuracy in a CPU-only robotic grasp detection network is worth studying, but most of the existing methods ignore this problem. To address these problems, we propose a Fast and Accurate robotic grasp detection Network (FANet), which not only enables the network to combine real-time and accuracy, but also enables real-time detection on CPU or GPU.

Original language	English
Pages (from-to)	1-13
Number of pages	13
Journal	IEEE Transactions on Automation Science and Engineering
DOIs	https://doi.org/10.1109/TASE.2023.3272664
Publication status	Accepted/In press - 2023

Keywords

Feature extraction
Grasp detection
Grasping
Object detection
Optimization
Real-time systems
Robot kinematics
Robots
keypoint
robot

Access to Document

10.1109/TASE.2023.3272664

Cite this

@article{3c91021fce264416a0731b554170bab6,

title = "FANet: Fast and Accurate Robotic Grasp Detection Based on Keypoints",

abstract = "In practice, the real-time and accuracy of robotic grasp detection are two very important metrics. In the past, researchers had to sacrifice the real-time nature of the detection network in order to obtain higher detection accuracy. How to make the real-time and accuracy of the network co-exist is a problem worth studying. In order to solve this problem, this paper proposes a network, FANet, based on grasp keypoints, which improves the accuracy of grasp detection while ensuring the real-time performance. The key of this paper is how to quickly and accurately detect grasped keypoints. To this end, this paper proposes a local refinement module that optimizes and de-duplicates each feature of the multi-scale feature map, enabling the network to make full use of the multi-scale features. We also propose a global feature refinement module that allows the network to make better use of global features. We also propose a grasp keypoint optimization module that predicts the offset between the actual keypoints and the predicted keypoints, enabling the network to predict the keypoints more accurately. Moreover, we develop two FANets specifically for grasp detection on CPU and GPU, both of which can accomplish real-time grasp detection in real-world scenes. We complete the training and testing of FANet on the Cornell dataset and the Jacquard dataset, achieving SOTA results on the Jacquard dataset. We also test FANet on a dataset of unknown objects, all with good results. Finally, we use the FANet in grasping experiments with an actual Baxter robot and achieve an average grasping success rate of 96%. Note to Practitioners—Real-time and accuracy are two very important metrics in robotic grasping detection. To achieve high accuracy, more time is often consumed for feature extraction. Similarly, in order to improve the real time performance, we need to reduce the time consumed in the feature extraction process, which may result in a drop in detection accuracy. How to coordinate the relationship between them, so as to have both, is a problem worth investigating. Current methods tend to focus on obtaining higher accuracy and they are willing to spend more time to achieve higher accuracy. But in some practical scenarios, such as on factory assembly lines, objects move fast, and the network needs to be able to detect the grasping position quickly, the real-time performance is more important, which makes some methods difficult to use. In addition, most of the current methods tend to focus on GPU-based robotic grasp detection methods, and in real-world scenarios we may not have such a powerful processing GPU available. In contrast, the CPU is an indispensable unit of the computer that we can use to process images without a high-performance GPU. However, compared to GPUs, the CPUs{\textquoteright} image processing capability is poor, making it difficult to achieve real-time processing. Faced with this situation, the problem of how to achieve real-time and high accuracy in a CPU-only robotic grasp detection network is worth studying, but most of the existing methods ignore this problem. To address these problems, we propose a Fast and Accurate robotic grasp detection Network (FANet), which not only enables the network to combine real-time and accuracy, but also enables real-time detection on CPU or GPU.",

keywords = "Feature extraction, Grasp detection, Grasping, Object detection, Optimization, Real-time systems, Robot kinematics, Robots, keypoint, robot",

author = "Zhai, {Di Hua} and Sheng Yu and Yuanqing Xia",

note = "Publisher Copyright: IEEE",

year = "2023",

doi = "10.1109/TASE.2023.3272664",

language = "English",

pages = "1--13",

journal = "IEEE Transactions on Automation Science and Engineering",

issn = "1545-5955",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - FANet

T2 - Fast and Accurate Robotic Grasp Detection Based on Keypoints

AU - Zhai, Di Hua

AU - Yu, Sheng

AU - Xia, Yuanqing

N1 - Publisher Copyright: IEEE

PY - 2023

Y1 - 2023

N2 - In practice, the real-time and accuracy of robotic grasp detection are two very important metrics. In the past, researchers had to sacrifice the real-time nature of the detection network in order to obtain higher detection accuracy. How to make the real-time and accuracy of the network co-exist is a problem worth studying. In order to solve this problem, this paper proposes a network, FANet, based on grasp keypoints, which improves the accuracy of grasp detection while ensuring the real-time performance. The key of this paper is how to quickly and accurately detect grasped keypoints. To this end, this paper proposes a local refinement module that optimizes and de-duplicates each feature of the multi-scale feature map, enabling the network to make full use of the multi-scale features. We also propose a global feature refinement module that allows the network to make better use of global features. We also propose a grasp keypoint optimization module that predicts the offset between the actual keypoints and the predicted keypoints, enabling the network to predict the keypoints more accurately. Moreover, we develop two FANets specifically for grasp detection on CPU and GPU, both of which can accomplish real-time grasp detection in real-world scenes. We complete the training and testing of FANet on the Cornell dataset and the Jacquard dataset, achieving SOTA results on the Jacquard dataset. We also test FANet on a dataset of unknown objects, all with good results. Finally, we use the FANet in grasping experiments with an actual Baxter robot and achieve an average grasping success rate of 96%. Note to Practitioners—Real-time and accuracy are two very important metrics in robotic grasping detection. To achieve high accuracy, more time is often consumed for feature extraction. Similarly, in order to improve the real time performance, we need to reduce the time consumed in the feature extraction process, which may result in a drop in detection accuracy. How to coordinate the relationship between them, so as to have both, is a problem worth investigating. Current methods tend to focus on obtaining higher accuracy and they are willing to spend more time to achieve higher accuracy. But in some practical scenarios, such as on factory assembly lines, objects move fast, and the network needs to be able to detect the grasping position quickly, the real-time performance is more important, which makes some methods difficult to use. In addition, most of the current methods tend to focus on GPU-based robotic grasp detection methods, and in real-world scenarios we may not have such a powerful processing GPU available. In contrast, the CPU is an indispensable unit of the computer that we can use to process images without a high-performance GPU. However, compared to GPUs, the CPUs’ image processing capability is poor, making it difficult to achieve real-time processing. Faced with this situation, the problem of how to achieve real-time and high accuracy in a CPU-only robotic grasp detection network is worth studying, but most of the existing methods ignore this problem. To address these problems, we propose a Fast and Accurate robotic grasp detection Network (FANet), which not only enables the network to combine real-time and accuracy, but also enables real-time detection on CPU or GPU.

AB - In practice, the real-time and accuracy of robotic grasp detection are two very important metrics. In the past, researchers had to sacrifice the real-time nature of the detection network in order to obtain higher detection accuracy. How to make the real-time and accuracy of the network co-exist is a problem worth studying. In order to solve this problem, this paper proposes a network, FANet, based on grasp keypoints, which improves the accuracy of grasp detection while ensuring the real-time performance. The key of this paper is how to quickly and accurately detect grasped keypoints. To this end, this paper proposes a local refinement module that optimizes and de-duplicates each feature of the multi-scale feature map, enabling the network to make full use of the multi-scale features. We also propose a global feature refinement module that allows the network to make better use of global features. We also propose a grasp keypoint optimization module that predicts the offset between the actual keypoints and the predicted keypoints, enabling the network to predict the keypoints more accurately. Moreover, we develop two FANets specifically for grasp detection on CPU and GPU, both of which can accomplish real-time grasp detection in real-world scenes. We complete the training and testing of FANet on the Cornell dataset and the Jacquard dataset, achieving SOTA results on the Jacquard dataset. We also test FANet on a dataset of unknown objects, all with good results. Finally, we use the FANet in grasping experiments with an actual Baxter robot and achieve an average grasping success rate of 96%. Note to Practitioners—Real-time and accuracy are two very important metrics in robotic grasping detection. To achieve high accuracy, more time is often consumed for feature extraction. Similarly, in order to improve the real time performance, we need to reduce the time consumed in the feature extraction process, which may result in a drop in detection accuracy. How to coordinate the relationship between them, so as to have both, is a problem worth investigating. Current methods tend to focus on obtaining higher accuracy and they are willing to spend more time to achieve higher accuracy. But in some practical scenarios, such as on factory assembly lines, objects move fast, and the network needs to be able to detect the grasping position quickly, the real-time performance is more important, which makes some methods difficult to use. In addition, most of the current methods tend to focus on GPU-based robotic grasp detection methods, and in real-world scenarios we may not have such a powerful processing GPU available. In contrast, the CPU is an indispensable unit of the computer that we can use to process images without a high-performance GPU. However, compared to GPUs, the CPUs’ image processing capability is poor, making it difficult to achieve real-time processing. Faced with this situation, the problem of how to achieve real-time and high accuracy in a CPU-only robotic grasp detection network is worth studying, but most of the existing methods ignore this problem. To address these problems, we propose a Fast and Accurate robotic grasp detection Network (FANet), which not only enables the network to combine real-time and accuracy, but also enables real-time detection on CPU or GPU.

KW - Feature extraction

KW - Grasp detection

KW - Grasping

KW - Object detection

KW - Optimization

KW - Real-time systems

KW - Robot kinematics

KW - Robots

KW - keypoint

KW - robot

UR - http://www.scopus.com/inward/record.url?scp=85162890046&partnerID=8YFLogxK

U2 - 10.1109/TASE.2023.3272664

DO - 10.1109/TASE.2023.3272664

M3 - Article

AN - SCOPUS:85162890046

SN - 1545-5955

SP - 1

EP - 13

JO - IEEE Transactions on Automation Science and Engineering

JF - IEEE Transactions on Automation Science and Engineering

ER -

FANet: Fast and Accurate Robotic Grasp Detection Based on Keypoints

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this