A speech-video synchrony quality metric using COIA

Yaodu Wei*, Xiang Xie, Jingming Kuang, Xinlu Han

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

A quality model was built to assess the influence of speechvideo asynchrony on the audio-visual quality perception. The audio-visual contents were separated into two categories: "speaker inside" and "speaker outside", depending on whether the speaker is inside the video. For the first category, speech was shifted in a small scale. DCT and MFCC coefficients were calculated from video and speech separately. A Co-inertia Analysis (CoIA) was used to decide the speech-video correlation, and as the speech progressively shifts, a correlation curve emerged. The curve was modeled by an Gaussian function, and then the function was used to predict the perceptual quality. On the other hand, a Gaussian curve was used to predict the perceptual quality of the "speaker outside" category. A subjective test proved the effectiveness of the proposed method.

Original languageEnglish
Title of host publicationPV 2010 - 2010 18th International Packet Video Workshop
Pages173-177
Number of pages5
DOIs
Publication statusPublished - 2010
Event2010 18th International Packet Video Workshop, PV 2010 - Hong Kong, China
Duration: 13 Dec 201014 Dec 2010

Publication series

NamePV 2010 - 2010 18th International Packet Video Workshop

Conference

Conference2010 18th International Packet Video Workshop, PV 2010
Country/TerritoryChina
CityHong Kong
Period13/12/1014/12/10

Keywords

  • Asynchrony
  • Audio-visual quality
  • Co-inertia analysis
  • QVGA
  • Speech

Fingerprint

Dive into the research topics of 'A speech-video synchrony quality metric using COIA'. Together they form a unique fingerprint.

Cite this