TY - GEN
T1 - Automatic personality perception from speech in Mandarin
AU - Zhu, Minxian
AU - Xie, Xiang
AU - Zhang, Liqiang
AU - Wang, Jing
N1 - Publisher Copyright:
� 2018 IEEE
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Researches on speech-based personality perception have been actively conducted in recent years. However, most of them are focus on French, while research on Chinese corpus has not been reported. This paper investigates automatic perception of speakers' personality from speech in Mandarin. The studied personality traits are extended from standard Big-Five to 6 sub-traits of each, leading to a total of 30 traits. As for modeling approach, traditional SVM system with standard prosodic feature extraction is studied as baseline. A novel skip-frame LSTM system is proposed, which resorts to skip-frame sampling to augment training data while retaining prosodic variation across a long time. The LSTM system learns personality information directly from frame level descriptor (MFCCs) instead of manually designed prosodic features as SVM system does. The results of both the two systems show that, the extraversion trait is the most easily perceived while openness is the most difficult one, which is in agreement with the results in French. In contrast, the high accuracy on agreeableness is peculiar to the previous result in French, which may relate to different culture backgrounds. In addition, relation between Big-Five and their sub-traits are discussed. Finally, the proposed skip-frame LSTM system outperforms obviously than SVM overall, demonstrating that LSTM can serve as an effective method for automatic personality perception even in low-resource condition.
AB - Researches on speech-based personality perception have been actively conducted in recent years. However, most of them are focus on French, while research on Chinese corpus has not been reported. This paper investigates automatic perception of speakers' personality from speech in Mandarin. The studied personality traits are extended from standard Big-Five to 6 sub-traits of each, leading to a total of 30 traits. As for modeling approach, traditional SVM system with standard prosodic feature extraction is studied as baseline. A novel skip-frame LSTM system is proposed, which resorts to skip-frame sampling to augment training data while retaining prosodic variation across a long time. The LSTM system learns personality information directly from frame level descriptor (MFCCs) instead of manually designed prosodic features as SVM system does. The results of both the two systems show that, the extraversion trait is the most easily perceived while openness is the most difficult one, which is in agreement with the results in French. In contrast, the high accuracy on agreeableness is peculiar to the previous result in French, which may relate to different culture backgrounds. In addition, relation between Big-Five and their sub-traits are discussed. Finally, the proposed skip-frame LSTM system outperforms obviously than SVM overall, demonstrating that LSTM can serve as an effective method for automatic personality perception even in low-resource condition.
KW - Automatic Personality Perception
KW - Long Short-Term Memory Network
KW - Mandarin
KW - Skip-Frame Sampling
KW - Support Vector Machine
UR - http://www.scopus.com/inward/record.url?scp=85065858438&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2018.8706692
DO - 10.1109/ISCSLP.2018.8706692
M3 - Conference contribution
AN - SCOPUS:85065858438
T3 - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
SP - 309
EP - 313
BT - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018
Y2 - 26 November 2018 through 29 November 2018
ER -