A Novel Audio-Oriented Learning Strategies for Character Recognition

Changbin Lu, Guangyu Gao*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we propose a robust audio-oriented learning strategies to address the issue of character recognition in movie/TV-series. Identifying major characters in movies/TV-series has drawn researcher's great interests. Most of them have explored some character recognition and retrieval applications based on visual appearance, whereas visual appearance is inconsistent throughout the whole video. Our approach, mainly focusing on audio, features that: (i) we extract both spectral and temporal audio features of Mel-scale Frequency Cepstral Coefficients(MFCC), prosodic, average pause length, speaking rate features, pitch and short time energy, and also the complementarity of Gabor features, (ii) we adopt Multi-Task Joint Sparse Representation and Recognition (MTJSRC) model for learning with all the features except Gabor, and SVM model with Gabor features, (iii) regarding these original features as seeds, we extend the training set from talk shows with semi-supervise learning, (iv) the Conditional Random Field (CRF) model with consideration of the constrains in time sequence is introduced to enhance the final labelling. Finally, experimental results demonstrates the effectiveness performance of our approach.

Original languageEnglish
Title of host publicationProceedings - 2016 International Conference on Virtual Reality and Visualization, ICVRV 2016
EditorsDandan Ding, Dangxiao Wang, Jian Chen, Xun Luo
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages459-464
Number of pages6
ISBN (Electronic)9781509051885
DOIs
Publication statusPublished - 1 Jun 2017
Externally publishedYes
Event6th International Conference on Virtual Reality and Visualization, ICVRV 2016 - Hangzhou, Zhejiang, China
Duration: 24 Sept 201626 Sept 2016

Publication series

NameProceedings - 2016 International Conference on Virtual Reality and Visualization, ICVRV 2016

Conference

Conference6th International Conference on Virtual Reality and Visualization, ICVRV 2016
Country/TerritoryChina
CityHangzhou, Zhejiang
Period24/09/1626/09/16

Keywords

  • Character recognition
  • Conditional Random Field
  • MFCC
  • Sparse Representation
  • Support Vector Machine

Fingerprint

Dive into the research topics of 'A Novel Audio-Oriented Learning Strategies for Character Recognition'. Together they form a unique fingerprint.

Cite this