Toward automatic audio description generation for accessible videos

Yujia Wang, Wei Liang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

39 Citations (Scopus)

Abstract

Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.

Original languageEnglish
Title of host publicationCHI 2021 - Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
Subtitle of host publicationMaking Waves, Combining Strengths
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450380966
DOIs
Publication statusPublished - 6 May 2021
Event10th International Conference on Materials Processing and Characterisation, ICMPC 2020 - Mathura, U.P., India
Duration: 21 Feb 202023 Feb 2020

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference10th International Conference on Materials Processing and Characterisation, ICMPC 2020
Country/TerritoryIndia
CityMathura, U.P.
Period21/02/2023/02/20

Keywords

  • Audio description
  • Audio-visual consistency
  • Video captioning, sentence-level embedding, accessibility
  • Video description

Fingerprint

Dive into the research topics of 'Toward automatic audio description generation for accessible videos'. Together they form a unique fingerprint.

Cite this