TY - GEN
T1 - Generating Co-Speech Gestures for Virtual Agents from Multimodal Information Based on Transformer
AU - Yu, Yue
AU - Shi, Jiande
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - To generate co-speech gestures for virtual agents and enhance the correlation between gestures and input modalities, we propose a Transformer-based model, which encodes four-modal-like information (Audio Waveform, Mel-Spectrogram, Text, and SpeakerIDs). For the Mel-Spectrogram modal, we design a Mel-Spectrogram encoder based on the Swin Transformer pre-trained model to extract the audio spectrum features hierarchically. For the Text modal, we use the Transformer encoder to extract text features aligned with the audio. We evaluate on the TED-Gesture dataset. Compared with the state-of-art methods, we improve the mean absolute joint error by 2.33%, the mean acceleration difference by 15.01%, and the Fréchet gesture distance by 59.32%.
AB - To generate co-speech gestures for virtual agents and enhance the correlation between gestures and input modalities, we propose a Transformer-based model, which encodes four-modal-like information (Audio Waveform, Mel-Spectrogram, Text, and SpeakerIDs). For the Mel-Spectrogram modal, we design a Mel-Spectrogram encoder based on the Swin Transformer pre-trained model to extract the audio spectrum features hierarchically. For the Text modal, we use the Transformer encoder to extract text features aligned with the audio. We evaluate on the TED-Gesture dataset. Compared with the state-of-art methods, we improve the mean absolute joint error by 2.33%, the mean acceleration difference by 15.01%, and the Fréchet gesture distance by 59.32%.
KW - Computer systems organization
KW - Computing methodologies-Co-Speech Gestures
KW - Computing methodologies-Virtual Agents
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85159687023&partnerID=8YFLogxK
U2 - 10.1109/VRW58643.2023.00286
DO - 10.1109/VRW58643.2023.00286
M3 - Conference contribution
AN - SCOPUS:85159687023
T3 - Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023
SP - 887
EP - 888
BT - Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023
Y2 - 25 March 2023 through 29 March 2023
ER -