Gaze Target Prediction with the Understanding of 3D Scenes

  • Leru Gao*
  • , Fengxi Sun
  • , Yue Liu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The goal of the gaze target prediction is to determine the location where the person is focusing and the probability of the gaze falls outside the image. Although prior works have addressed this task by regressing heatmaps centered on the gaze location, they typically fail to incorporate the scene's semantic information. In this work, we first generate 3D point cloud of the given image based on depth estimation and camera intrinsics. Then we combine the point cloud and estimated 3D gaze vector to generate the 3D field of view (FoV) heatmap. Scene contextual cues are finally merged to get the output heatmap. Our method achieves competitive results on the ChildPlay, GazeFollow, and VideoAttentionTarget datasets.

Original languageEnglish
Title of host publicationImage and Graphics Technologies and Applications - 20th Chinese Conference, IGTA 2025, Revised Selected Papers
EditorsYongtian Wang, Yi Chen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages129-143
Number of pages15
ISBN (Print)9789819549658
DOIs
Publication statusPublished - 2026
Externally publishedYes
Event20th Chinese Conference on Image and Graphics Technologies and Applications, IGTA 2025 - Beijing, China
Duration: 9 Aug 202510 Aug 2025

Publication series

NameCommunications in Computer and Information Science
Volume2800 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference20th Chinese Conference on Image and Graphics Technologies and Applications, IGTA 2025
Country/TerritoryChina
CityBeijing
Period9/08/2510/08/25

Keywords

  • 3D scene understanding
  • gaze estimation
  • gaze target prediction

Fingerprint

Dive into the research topics of 'Gaze Target Prediction with the Understanding of 3D Scenes'. Together they form a unique fingerprint.

Cite this