FreePose: Zero-Shot 6D Object Pose Estimation Using Pretrained Foundation Models

Research output: Contribution to journalArticlepeer-review

Abstract

An accurate 6D object pose estimation is essential for robotic manipulation and augmented reality applications. Existing methods typically require extensive training for new objects, limiting their effectiveness in dynamic environments where new objects are frequently introduced. In this paper, we propose FreePose, an efficient free-trained zero-shot 6D pose estimation method leveraging pre-trained visual and geometric foundation models. Our approach includes an offline onboarding stage, in which multiple viewpoint templates of a reference object are rendered, then visual and geometric features are extracted using visual and geometric pretrained models, respectively. These visual features are then back-projected onto corresponding 3D points, enabling a precise alignment between appearance and geometry, and subsequently fused with geometric features to form a robust unified representation. During inference stage, target object instances are segmented from RGB-D image using SAM2 coupled with an object-matching algorithm. Visual features of each target instance is similarly extracted, back-projected, and fused with geometric features. Robust 3D-3D correspondences are then established using nearest-neighbor search. Finally, pose estimation is obtained using the TEASER registration algorithm. Extensive evaluations conducted on the BOP5 core datasets show that our approach achieves results comparable to state-of-the-art methods. To highlight the effectiveness and potential of FreePose in real-world scenarios, FreePose is deployed on a real UR3 robot to perform grasping experiments reaching a success grasp rate of 65.0%.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • 6D object pose estimation
  • Unseen Objects
  • features fusion
  • foundation models
  • free-trained

Fingerprint

Dive into the research topics of 'FreePose: Zero-Shot 6D Object Pose Estimation Using Pretrained Foundation Models'. Together they form a unique fingerprint.

Cite this