In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces

Haiyan Jiang; Leiyu Song; Dongdong Weng; Zhe Sun; Huiying Li; Xiaonuo Dongye; Zhenliang Zhang

doi:10.1145/3664647.3681616

In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces

Haiyan Jiang, Leiyu Song^*, Dongdong Weng, Zhe Sun, Huiying Li, Xiaonuo Dongye, Zhenliang Zhang^*

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Virtual reality enables us to access and interact with immersive virtual environments anytime and anywhere in various fields such as entertainment, training, and education. However, users immersed in virtual scenes remain physically connected to their real-world surroundings, which can pose safety and immersion challenges. Although virtual scene synthesis has attracted widespread attention, many popular methods are limited to generating purely virtual scenes independent of physical environments or simply mapping physical objects as obstacles. To this end, we propose a scene agent that synthesizes situated 3D virtual scenes as a kind of ubiquitous embodied interface in VR for users. The scene agent synthesizes scenes by perceiving the user's physical environment as well as inferring the user's demands. The synthesized scenes maintain the affordances of the physical environment, enabling immersive users to interact with the physical environment and improving the user's sense of security. Meanwhile, the synthesized scenes maintain the style described by the user, improving the user's immersion. The comparison results show that the proposed scene agent can synthesize virtual scenes with better affordance maintenance, scene diversity, style maintenance, and 3D intersection over union compared to baselines. To the best of our knowledge, this is the first work that achieves in situ scene synthesis with virtual-real affordance consistency and user demand.

Original language	English
Title of host publication	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
Publisher	Association for Computing Machinery, Inc
Pages	3666-3675
Number of pages	10
ISBN (Electronic)	9798400706868
DOIs	https://doi.org/10.1145/3664647.3681616
Publication status	Published - 28 Oct 2024
Event	32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia Duration: 28 Oct 2024 → 1 Nov 2024

Publication series

Name	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference	32nd ACM International Conference on Multimedia, MM 2024
Country/Territory	Australia
City	Melbourne
Period	28/10/24 → 1/11/24

Keywords

affordance
large language model
scene synthesis
user demand

Access to Document

10.1145/3664647.3681616

Cite this

Jiang, H., Song, L., Weng, D., Sun, Z., Li, H., Dongye, X., & Zhang, Z. (2024). In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces. In MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia (pp. 3666-3675). (MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia). Association for Computing Machinery, Inc. https://doi.org/10.1145/3664647.3681616

@inproceedings{f5f384611bec455f84f90eb97c57794c,

title = "In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces",

abstract = "Virtual reality enables us to access and interact with immersive virtual environments anytime and anywhere in various fields such as entertainment, training, and education. However, users immersed in virtual scenes remain physically connected to their real-world surroundings, which can pose safety and immersion challenges. Although virtual scene synthesis has attracted widespread attention, many popular methods are limited to generating purely virtual scenes independent of physical environments or simply mapping physical objects as obstacles. To this end, we propose a scene agent that synthesizes situated 3D virtual scenes as a kind of ubiquitous embodied interface in VR for users. The scene agent synthesizes scenes by perceiving the user's physical environment as well as inferring the user's demands. The synthesized scenes maintain the affordances of the physical environment, enabling immersive users to interact with the physical environment and improving the user's sense of security. Meanwhile, the synthesized scenes maintain the style described by the user, improving the user's immersion. The comparison results show that the proposed scene agent can synthesize virtual scenes with better affordance maintenance, scene diversity, style maintenance, and 3D intersection over union compared to baselines. To the best of our knowledge, this is the first work that achieves in situ scene synthesis with virtual-real affordance consistency and user demand.",

keywords = "affordance, large language model, scene synthesis, user demand",

author = "Haiyan Jiang and Leiyu Song and Dongdong Weng and Zhe Sun and Huiying Li and Xiaonuo Dongye and Zhenliang Zhang",

note = "Publisher Copyright: {\textcopyright} 2024 ACM.; 32nd ACM International Conference on Multimedia, MM 2024 ; Conference date: 28-10-2024 Through 01-11-2024",

year = "2024",

month = oct,

day = "28",

doi = "10.1145/3664647.3681616",

language = "English",

series = "MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "3666--3675",

booktitle = "MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia",

}

Jiang, H, Song, L, Weng, D, Sun, Z, Li, H, Dongye, X & Zhang, Z 2024, In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces. in MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia. MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, Association for Computing Machinery, Inc, pp. 3666-3675, 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, Australia, 28/10/24. https://doi.org/10.1145/3664647.3681616

In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces. / Jiang, Haiyan; Song, Leiyu; Weng, Dongdong et al.
MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2024. p. 3666-3675 (MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces

AU - Jiang, Haiyan

AU - Song, Leiyu

AU - Weng, Dongdong

AU - Sun, Zhe

AU - Li, Huiying

AU - Dongye, Xiaonuo

AU - Zhang, Zhenliang

PY - 2024/10/28

Y1 - 2024/10/28

N2 - Virtual reality enables us to access and interact with immersive virtual environments anytime and anywhere in various fields such as entertainment, training, and education. However, users immersed in virtual scenes remain physically connected to their real-world surroundings, which can pose safety and immersion challenges. Although virtual scene synthesis has attracted widespread attention, many popular methods are limited to generating purely virtual scenes independent of physical environments or simply mapping physical objects as obstacles. To this end, we propose a scene agent that synthesizes situated 3D virtual scenes as a kind of ubiquitous embodied interface in VR for users. The scene agent synthesizes scenes by perceiving the user's physical environment as well as inferring the user's demands. The synthesized scenes maintain the affordances of the physical environment, enabling immersive users to interact with the physical environment and improving the user's sense of security. Meanwhile, the synthesized scenes maintain the style described by the user, improving the user's immersion. The comparison results show that the proposed scene agent can synthesize virtual scenes with better affordance maintenance, scene diversity, style maintenance, and 3D intersection over union compared to baselines. To the best of our knowledge, this is the first work that achieves in situ scene synthesis with virtual-real affordance consistency and user demand.

AB - Virtual reality enables us to access and interact with immersive virtual environments anytime and anywhere in various fields such as entertainment, training, and education. However, users immersed in virtual scenes remain physically connected to their real-world surroundings, which can pose safety and immersion challenges. Although virtual scene synthesis has attracted widespread attention, many popular methods are limited to generating purely virtual scenes independent of physical environments or simply mapping physical objects as obstacles. To this end, we propose a scene agent that synthesizes situated 3D virtual scenes as a kind of ubiquitous embodied interface in VR for users. The scene agent synthesizes scenes by perceiving the user's physical environment as well as inferring the user's demands. The synthesized scenes maintain the affordances of the physical environment, enabling immersive users to interact with the physical environment and improving the user's sense of security. Meanwhile, the synthesized scenes maintain the style described by the user, improving the user's immersion. The comparison results show that the proposed scene agent can synthesize virtual scenes with better affordance maintenance, scene diversity, style maintenance, and 3D intersection over union compared to baselines. To the best of our knowledge, this is the first work that achieves in situ scene synthesis with virtual-real affordance consistency and user demand.

KW - affordance

KW - large language model

KW - scene synthesis

KW - user demand

UR - http://www.scopus.com/inward/record.url?scp=85209812307&partnerID=8YFLogxK

U2 - 10.1145/3664647.3681616

DO - 10.1145/3664647.3681616

M3 - Conference contribution

AN - SCOPUS:85209812307

T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

SP - 3666

EP - 3675

BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

T2 - 32nd ACM International Conference on Multimedia, MM 2024

Y2 - 28 October 2024 through 1 November 2024

ER -

In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this