PITAR: An LLM-powered Agent towards Intelligent and Accurate Manipulations in Extended Reality with Multimodal Interactions

  • Haiyan Jiang
  • , Dongyu Qiu
  • , Ana Stanescu
  • , Yidi Wang
  • , Henry Been Lirn Duh
  • , Frank Guan*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present PITAR, an large language model (LLM)-powered agent for intelligent and accurate manipulations in extended reality (XR) through multimodal interactions. PITAR integrates eye gaze, pointing gesture, and speech - particularly pronoun-based commands - to correctly infer user intent and control virtual objects. Using real-time data from XR headset and a few-shot prompting strategy, PITAR performs joint reasoning over multimodal signals and memory of the scenario to identify the target object and determine desired action and interaction parameters. A prototype VR system implementing PITAR demonstrates intuitive, human-like communication between users and virtual environments, advancing the development of intelligent agents for immersive interaction.

Original languageEnglish
Title of host publicationProceedings - SIGGRAPH Asia 2025 XR, SA 2025
EditorsStephen N. Spencer, Taku Komura, Evan Yifan Peng
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400721687
DOIs
Publication statusPublished - 14 Dec 2025
Externally publishedYes
EventSIGGRAPH Asia 2025 XR, SA 2025 - Hong Kong, Hong Kong
Duration: 15 Dec 202518 Dec 2025

Publication series

NameProceedings - SIGGRAPH Asia 2025 XR, SA 2025

Conference

ConferenceSIGGRAPH Asia 2025 XR, SA 2025
Country/TerritoryHong Kong
CityHong Kong
Period15/12/2518/12/25

Fingerprint

Dive into the research topics of 'PITAR: An LLM-powered Agent towards Intelligent and Accurate Manipulations in Extended Reality with Multimodal Interactions'. Together they form a unique fingerprint.

Cite this