TY - GEN
T1 - A Novel Audio-Oriented Learning Strategies for Character Recognition
AU - Lu, Changbin
AU - Gao, Guangyu
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - In this paper, we propose a robust audio-oriented learning strategies to address the issue of character recognition in movie/TV-series. Identifying major characters in movies/TV-series has drawn researcher's great interests. Most of them have explored some character recognition and retrieval applications based on visual appearance, whereas visual appearance is inconsistent throughout the whole video. Our approach, mainly focusing on audio, features that: (i) we extract both spectral and temporal audio features of Mel-scale Frequency Cepstral Coefficients(MFCC), prosodic, average pause length, speaking rate features, pitch and short time energy, and also the complementarity of Gabor features, (ii) we adopt Multi-Task Joint Sparse Representation and Recognition (MTJSRC) model for learning with all the features except Gabor, and SVM model with Gabor features, (iii) regarding these original features as seeds, we extend the training set from talk shows with semi-supervise learning, (iv) the Conditional Random Field (CRF) model with consideration of the constrains in time sequence is introduced to enhance the final labelling. Finally, experimental results demonstrates the effectiveness performance of our approach.
AB - In this paper, we propose a robust audio-oriented learning strategies to address the issue of character recognition in movie/TV-series. Identifying major characters in movies/TV-series has drawn researcher's great interests. Most of them have explored some character recognition and retrieval applications based on visual appearance, whereas visual appearance is inconsistent throughout the whole video. Our approach, mainly focusing on audio, features that: (i) we extract both spectral and temporal audio features of Mel-scale Frequency Cepstral Coefficients(MFCC), prosodic, average pause length, speaking rate features, pitch and short time energy, and also the complementarity of Gabor features, (ii) we adopt Multi-Task Joint Sparse Representation and Recognition (MTJSRC) model for learning with all the features except Gabor, and SVM model with Gabor features, (iii) regarding these original features as seeds, we extend the training set from talk shows with semi-supervise learning, (iv) the Conditional Random Field (CRF) model with consideration of the constrains in time sequence is introduced to enhance the final labelling. Finally, experimental results demonstrates the effectiveness performance of our approach.
KW - Character recognition
KW - Conditional Random Field
KW - MFCC
KW - Sparse Representation
KW - Support Vector Machine
UR - http://www.scopus.com/inward/record.url?scp=85025473225&partnerID=8YFLogxK
U2 - 10.1109/ICVRV.2016.84
DO - 10.1109/ICVRV.2016.84
M3 - Conference contribution
AN - SCOPUS:85025473225
T3 - Proceedings - 2016 International Conference on Virtual Reality and Visualization, ICVRV 2016
SP - 459
EP - 464
BT - Proceedings - 2016 International Conference on Virtual Reality and Visualization, ICVRV 2016
A2 - Ding, Dandan
A2 - Wang, Dangxiao
A2 - Chen, Jian
A2 - Luo, Xun
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International Conference on Virtual Reality and Visualization, ICVRV 2016
Y2 - 24 September 2016 through 26 September 2016
ER -