TY - GEN
T1 - Toward automatic audio description generation for accessible videos
AU - Wang, Yujia
AU - Liang, Wei
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/5/6
Y1 - 2021/5/6
N2 - Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.
AB - Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.
KW - Audio description
KW - Audio-visual consistency
KW - Video captioning, sentence-level embedding, accessibility
KW - Video description
UR - http://www.scopus.com/inward/record.url?scp=85106756267&partnerID=8YFLogxK
U2 - 10.1145/3411764.3445347
DO - 10.1145/3411764.3445347
M3 - Conference contribution
AN - SCOPUS:85106756267
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2021 - Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
PB - Association for Computing Machinery
T2 - 10th International Conference on Materials Processing and Characterisation, ICMPC 2020
Y2 - 21 February 2020 through 23 February 2020
ER -