Abstract
The task of executing object grasping in unstructured and cluttered environments is a significant challenge. Despite the development of various 6-DoF grasping methods to tackle this issue, rapid grasping objects from arbitrary viewpoints remains difficult. In this article, we introduce a zero-shot 6-DoF grasp pose estimation method for unstructured cluttered scenes, named FS-Grasp. Initially, we leverage the zero-shot capabilities of the segment anything model to perform object segmentation in cluttered scenes, thereby obtaining point clouds of unknown objects. Next, we design a zero-shot 6-DoF grasp pose prediction algorithm based on these object point clouds, enabling the detection of grasp poses for unknown objects in cluttered environments. In FS-Grasp, we introduce a multiscale, multiangle graspable region search algorithm that integrates transformers to conduct a comprehensive search for graspable poses. We conduct grasping tests across various datasets, and our experimental results demonstrate that the proposed FS-Grasp can be effectively applied to most zero-shot grasping tasks. Furthermore, we apply FS-Grasp in diverse human–robot interaction scenarios, establishing an autonomous robot grasping framework based on visual language large models, which successfully performs the grasping and placement of multiple unknown objects, showcasing considerable practical application value.
| Original language | English |
|---|---|
| Journal | IEEE/ASME Transactions on Mechatronics |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- 6-DoF grasp
- human–robot interaction
- segment anything model (SAM)
- visual language model (VLM)
- zero-shot
Fingerprint
Dive into the research topics of 'Fast and Efficient 6-DoF Grasp Estimation With Segment Anything Model in Cluttered Scenes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver