From Notation to Gesture: Virtual Conductor Gesture Generation in VR Via Structured Score Semantics

  • Haozhe Ma
  • , Yuxin Shen
  • , Wei Liang*
  • , Yunde Jia*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Conductor avatar plays a dual role in immersive Virtual Reality (VR) interactive systems by interpreting musical scores and guiding orchestral performance. Rule-based score-driven methods ensure precise synchronization with predefined conducting templates or videos, but are constrained by pre-authored data. Audio-driven frameworks offer greater adaptability through real-time gesture generation but often fail to capture the symbolic semantics of musical scores. To overcome these limitations, we propose a novel score-driven gesture generation framework that translates symbolic musical representations into plausible conducting gestures. Our approach adopts a two-stage architecture, combining a comparative learning stage for pre-training a score encoder with a generative learning stage for gesture synthesis. The score encoder explicitly models musical features such as tempo, chord, intensity, and cycle semantics, directly informing gesture generation. To support this research, we introduce Multimodal Symphonic Conducting Dataset (MSCD), the first synchronized dataset comprising conducting gestures, performance audio, and editable symbolic scores, effectively bridging the gap between musical semantics and gesture synthesis. Qualitative and quantitative analyses are provided to demonstrate the effectiveness of our approach, while a user study is designed to identify the strengths and limitations of the current work.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2025
EditorsUlrich Eck, Gun Lee, Alexander Plopski, Missie Smith, Qi Sun, Markus Tatzgern
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages603-613
Number of pages11
ISBN (Electronic)9798331587611
DOIs
Publication statusPublished - 2025
Event24th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2025 - Daejeon, Korea, Republic of
Duration: 8 Oct 202512 Oct 2025

Publication series

NameProceedings - 2025 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2025

Conference

Conference24th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2025
Country/TerritoryKorea, Republic of
CityDaejeon
Period8/10/2512/10/25

Keywords

  • human-computer interaction
  • symphony
  • virtual reality

Fingerprint

Dive into the research topics of 'From Notation to Gesture: Virtual Conductor Gesture Generation in VR Via Structured Score Semantics'. Together they form a unique fingerprint.

Cite this