Language Guided Robotic Grasping with Fine-Grained Instructions

Qiang Sun, Haitao Lin, Ying Fu, Yanwei Fu, Xiangyang Xue

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Given a single RGB image and the attribute-rich language instructions, this paper investigates the novel problem of using Fine-grained instructions for the Language guided robotic Grasping (FLarG). This problem is made challenging by learning fine-grained language descriptions to ground target objects. Recent advances have been made in visually grounding the objects simply by several coarse attributes [1]. However, these methods have poor performance as they cannot well align the multi-modal features, and do not make the best of recent powerful large pre-trained vision and language models, e.g., CLIP. To this end, this paper proposes a FLarG pipeline including stages of CLIP-guided object localization, and 6-DoF category-level object pose estimation for grasping. Specially, we first take the CLIP-based segmentation model CRIS as the backbone and propose an end-to-end DyCRIS model that uses a novel dynamic mask strategy to well fuse the multi-level language and vision features. Then, the well-trained instance segmentation backbone Mask R-CNN is adopted to further improve the predicted mask of our DyCRIS. Finally, the target object pose is inferred for the robotics grasping by using the recent 6-DoF object pose estimation method. To validate our CLIP-enhanced pipeline, we also construct a validation dataset for our FLarG task and name it RefNOCS. Extensive results on RefNOCS have shown the utility and effectiveness of our proposed method. The project homepage is available at https://sunqiang85.github.ioIFLarG/.

Original languageEnglish
Title of host publication2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1319-1326
Number of pages8
ISBN (Electronic)9781665491907
DOIs
Publication statusPublished - 2023
Event2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023 - Detroit, United States
Duration: 1 Oct 20235 Oct 2023

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
Country/TerritoryUnited States
CityDetroit
Period1/10/235/10/23

Fingerprint

Dive into the research topics of 'Language Guided Robotic Grasping with Fine-Grained Instructions'. Together they form a unique fingerprint.

Cite this