Active Scene Recognition for Domestic Robots: Observing, Moving, and Recognizing

  • Shaopeng Liu
  • , Chao Huang*
  • , Hailong Huang
  • , Jingda Wu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In this article, we focus on the challenging problem of robot scene recognition (SR) with uncertain view and position in unknown domestic environments. Inspired by active vision, we propose an active scene recognition (ASR) method that integrates an active view changing based on Markov decision with SR based on multiview images. We design a deep Q-learning-based action model to generate suitable movement actions, adjusting the robot’s observation to acquire some beneficial multiview images for SR. To handle these scene images, we introduce a multiview SR model. This model includes a scene score model (SSM) to rate each image and a scene prediction module (SPM) to determine the SR result as well as to stop actions automatically for SR efficiency. To train the recognition model, we devise a method for generating multiview scene images, creating ample training data from existing scene datasets without manual, time-consuming image capturing. We conducted comparative experiments and ablation studies in plenty of simulated domestic environments to extensively evaluate the ASR method. The results indicate that our method surpasses the current SR methods in accuracy and efficiency. Furthermore, SR experiments by a TurtleBot 4 robot in a real-world domestic environment validate the effectiveness of our method.

Original languageEnglish
Pages (from-to)9591-9603
Number of pages13
JournalIEEE Transactions on Systems, Man, and Cybernetics: Systems
Volume55
Issue number12
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Data generation
  • domestic robot
  • multi-input recognition model
  • robot active vision
  • scene recognition (SR)

Fingerprint

Dive into the research topics of 'Active Scene Recognition for Domestic Robots: Observing, Moving, and Recognizing'. Together they form a unique fingerprint.

Cite this