A Two-Stream CNN With Simultaneous Detection and Segmentation for Robotic Grasping

Yingying Yu; Zhiqiang Cao; Zhicheng Liu; Wenjie Geng; Junzhi Yu; Weimin Zhang

doi:10.1109/TSMC.2020.3018757

A Two-Stream CNN With Simultaneous Detection and Segmentation for Robotic Grasping

Yingying Yu, Zhiqiang Cao^*, Zhicheng Liu, Wenjie Geng, Junzhi Yu, Weimin Zhang

^*此作品的通讯作者

机电学院

CAS - Institute of Automation

科研成果: 期刊稿件 › 文章 › 同行评审

23 引用（Scopus）

摘要

The manipulating robots receive much attention by offering better services, where object grasping is still challenging especially under background interferences. In this article, a novel two-stream grasping convolutional neural network (CNN) with simultaneous detection and segmentation is proposed. The proposed method is cascaded by an improved simultaneous detection and segmentation network BlitzNet and a two-stream grasping CNN TsGNet. The improved BlitzNet introduces the channel-based attention mechanism, and achieves an improvement of detection accuracy and segmentation accuracy with the combination of the learning of multitask loss weightings and background suppression. Based on the obtained bounding box and the segmentation mask of the target object, the target object is separated from the background, and the corresponding depth map and grayscale map are sent to TsGNet. By adopting depthwise separable convolution and designed global deconvolution network, TsGNet achieves the best grasp detection with only a small amount of network parameters. This best grasp in the pixel coordinate system is converted to a desired 6-D pose for the robot, which drives the manipulator to execute grasping. The proposed method combines a grasping CNN with simultaneous detection and segmentation to achieve the best grasp with a good adaptability to background. With the Cornell grasping dataset, the image-wise accuracy and object-wise accuracy of the proposed TsGNet are 93.13% and 92.99%, respectively. The effectiveness of the proposed method is verified by the experiments.

源语言	英语
页（从-至）	1167-1181
页数	15
期刊	IEEE Transactions on Systems, Man, and Cybernetics: Systems
卷	52
期	2
DOI	https://doi.org/10.1109/TSMC.2020.3018757
出版状态	已出版 - 1 2月 2022

访问文件

10.1109/TSMC.2020.3018757

其它文件与链接

链接到 Scopus 的出版物

引用此

Yu, Y., Cao, Z., Liu, Z., Geng, W., Yu, J., & Zhang, W. (2022). A Two-Stream CNN With Simultaneous Detection and Segmentation for Robotic Grasping. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(2), 1167-1181. https://doi.org/10.1109/TSMC.2020.3018757

@article{959fc7c3d6f6462c8c2f9b3b88c09e7a,

title = "A Two-Stream CNN With Simultaneous Detection and Segmentation for Robotic Grasping",

abstract = "The manipulating robots receive much attention by offering better services, where object grasping is still challenging especially under background interferences. In this article, a novel two-stream grasping convolutional neural network (CNN) with simultaneous detection and segmentation is proposed. The proposed method is cascaded by an improved simultaneous detection and segmentation network BlitzNet and a two-stream grasping CNN TsGNet. The improved BlitzNet introduces the channel-based attention mechanism, and achieves an improvement of detection accuracy and segmentation accuracy with the combination of the learning of multitask loss weightings and background suppression. Based on the obtained bounding box and the segmentation mask of the target object, the target object is separated from the background, and the corresponding depth map and grayscale map are sent to TsGNet. By adopting depthwise separable convolution and designed global deconvolution network, TsGNet achieves the best grasp detection with only a small amount of network parameters. This best grasp in the pixel coordinate system is converted to a desired 6-D pose for the robot, which drives the manipulator to execute grasping. The proposed method combines a grasping CNN with simultaneous detection and segmentation to achieve the best grasp with a good adaptability to background. With the Cornell grasping dataset, the image-wise accuracy and object-wise accuracy of the proposed TsGNet are 93.13% and 92.99%, respectively. The effectiveness of the proposed method is verified by the experiments.",

keywords = "Global deconvolution network (GDN), robotic grasping, simultaneous detection and segmentation, two-stream grasping convolutional neural network (CNN)",

author = "Yingying Yu and Zhiqiang Cao and Zhicheng Liu and Wenjie Geng and Junzhi Yu and Weimin Zhang",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2022",

month = feb,

day = "1",

doi = "10.1109/TSMC.2020.3018757",

language = "English",

volume = "52",

pages = "1167--1181",

journal = "IEEE Transactions on Systems, Man, and Cybernetics: Systems",

issn = "2168-2216",

publisher = "IEEE Advancing Technology for Humanity",

number = "2",

}

TY - JOUR

T1 - A Two-Stream CNN With Simultaneous Detection and Segmentation for Robotic Grasping

AU - Yu, Yingying

AU - Cao, Zhiqiang

AU - Liu, Zhicheng

AU - Geng, Wenjie

AU - Yu, Junzhi

AU - Zhang, Weimin

PY - 2022/2/1

Y1 - 2022/2/1

N2 - The manipulating robots receive much attention by offering better services, where object grasping is still challenging especially under background interferences. In this article, a novel two-stream grasping convolutional neural network (CNN) with simultaneous detection and segmentation is proposed. The proposed method is cascaded by an improved simultaneous detection and segmentation network BlitzNet and a two-stream grasping CNN TsGNet. The improved BlitzNet introduces the channel-based attention mechanism, and achieves an improvement of detection accuracy and segmentation accuracy with the combination of the learning of multitask loss weightings and background suppression. Based on the obtained bounding box and the segmentation mask of the target object, the target object is separated from the background, and the corresponding depth map and grayscale map are sent to TsGNet. By adopting depthwise separable convolution and designed global deconvolution network, TsGNet achieves the best grasp detection with only a small amount of network parameters. This best grasp in the pixel coordinate system is converted to a desired 6-D pose for the robot, which drives the manipulator to execute grasping. The proposed method combines a grasping CNN with simultaneous detection and segmentation to achieve the best grasp with a good adaptability to background. With the Cornell grasping dataset, the image-wise accuracy and object-wise accuracy of the proposed TsGNet are 93.13% and 92.99%, respectively. The effectiveness of the proposed method is verified by the experiments.

AB - The manipulating robots receive much attention by offering better services, where object grasping is still challenging especially under background interferences. In this article, a novel two-stream grasping convolutional neural network (CNN) with simultaneous detection and segmentation is proposed. The proposed method is cascaded by an improved simultaneous detection and segmentation network BlitzNet and a two-stream grasping CNN TsGNet. The improved BlitzNet introduces the channel-based attention mechanism, and achieves an improvement of detection accuracy and segmentation accuracy with the combination of the learning of multitask loss weightings and background suppression. Based on the obtained bounding box and the segmentation mask of the target object, the target object is separated from the background, and the corresponding depth map and grayscale map are sent to TsGNet. By adopting depthwise separable convolution and designed global deconvolution network, TsGNet achieves the best grasp detection with only a small amount of network parameters. This best grasp in the pixel coordinate system is converted to a desired 6-D pose for the robot, which drives the manipulator to execute grasping. The proposed method combines a grasping CNN with simultaneous detection and segmentation to achieve the best grasp with a good adaptability to background. With the Cornell grasping dataset, the image-wise accuracy and object-wise accuracy of the proposed TsGNet are 93.13% and 92.99%, respectively. The effectiveness of the proposed method is verified by the experiments.

KW - Global deconvolution network (GDN)

KW - robotic grasping

KW - simultaneous detection and segmentation

KW - two-stream grasping convolutional neural network (CNN)

UR - http://www.scopus.com/inward/record.url?scp=85123675245&partnerID=8YFLogxK

U2 - 10.1109/TSMC.2020.3018757

DO - 10.1109/TSMC.2020.3018757

M3 - Article

AN - SCOPUS:85123675245

SN - 2168-2216

VL - 52

SP - 1167

EP - 1181

JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems

JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems

IS - 2

ER -

A Two-Stream CNN With Simultaneous Detection and Segmentation for Robotic Grasping

摘要

访问文件

其它文件与链接

指纹

引用此