TY - GEN
T1 - From Notation to Gesture
T2 - 24th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2025
AU - Ma, Haozhe
AU - Shen, Yuxin
AU - Liang, Wei
AU - Jia, Yunde
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Conductor avatar plays a dual role in immersive Virtual Reality (VR) interactive systems by interpreting musical scores and guiding orchestral performance. Rule-based score-driven methods ensure precise synchronization with predefined conducting templates or videos, but are constrained by pre-authored data. Audio-driven frameworks offer greater adaptability through real-time gesture generation but often fail to capture the symbolic semantics of musical scores. To overcome these limitations, we propose a novel score-driven gesture generation framework that translates symbolic musical representations into plausible conducting gestures. Our approach adopts a two-stage architecture, combining a comparative learning stage for pre-training a score encoder with a generative learning stage for gesture synthesis. The score encoder explicitly models musical features such as tempo, chord, intensity, and cycle semantics, directly informing gesture generation. To support this research, we introduce Multimodal Symphonic Conducting Dataset (MSCD), the first synchronized dataset comprising conducting gestures, performance audio, and editable symbolic scores, effectively bridging the gap between musical semantics and gesture synthesis. Qualitative and quantitative analyses are provided to demonstrate the effectiveness of our approach, while a user study is designed to identify the strengths and limitations of the current work.
AB - Conductor avatar plays a dual role in immersive Virtual Reality (VR) interactive systems by interpreting musical scores and guiding orchestral performance. Rule-based score-driven methods ensure precise synchronization with predefined conducting templates or videos, but are constrained by pre-authored data. Audio-driven frameworks offer greater adaptability through real-time gesture generation but often fail to capture the symbolic semantics of musical scores. To overcome these limitations, we propose a novel score-driven gesture generation framework that translates symbolic musical representations into plausible conducting gestures. Our approach adopts a two-stage architecture, combining a comparative learning stage for pre-training a score encoder with a generative learning stage for gesture synthesis. The score encoder explicitly models musical features such as tempo, chord, intensity, and cycle semantics, directly informing gesture generation. To support this research, we introduce Multimodal Symphonic Conducting Dataset (MSCD), the first synchronized dataset comprising conducting gestures, performance audio, and editable symbolic scores, effectively bridging the gap between musical semantics and gesture synthesis. Qualitative and quantitative analyses are provided to demonstrate the effectiveness of our approach, while a user study is designed to identify the strengths and limitations of the current work.
KW - human-computer interaction
KW - symphony
KW - virtual reality
UR - https://www.scopus.com/pages/publications/105025037200
U2 - 10.1109/ISMAR67309.2025.00070
DO - 10.1109/ISMAR67309.2025.00070
M3 - Conference contribution
AN - SCOPUS:105025037200
T3 - Proceedings - 2025 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2025
SP - 603
EP - 613
BT - Proceedings - 2025 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2025
A2 - Eck, Ulrich
A2 - Lee, Gun
A2 - Plopski, Alexander
A2 - Smith, Missie
A2 - Sun, Qi
A2 - Tatzgern, Markus
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 October 2025 through 12 October 2025
ER -