Toward automatic audio description generation for accessible videos

Yujia Wang; Wei Liang

doi:10.1145/3411764.3445347

Toward automatic audio description generation for accessible videos

Yujia Wang, Wei Liang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

39 引用（Scopus）

摘要

Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.

源语言	英语
主期刊名	CHI 2021 - Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
主期刊副标题	Making Waves, Combining Strengths
出版商	Association for Computing Machinery
ISBN（电子版）	9781450380966
DOI	https://doi.org/10.1145/3411764.3445347
出版状态	已出版 - 6 5月 2021
活动	10th International Conference on Materials Processing and Characterisation, ICMPC 2020 - Mathura, U.P., 印度期限: 21 2月 2020 → 23 2月 2020

出版系列

姓名	Conference on Human Factors in Computing Systems - Proceedings

会议

会议	10th International Conference on Materials Processing and Characterisation, ICMPC 2020
国家/地区	印度
市	Mathura, U.P.
时期	21/02/20 → 23/02/20

访问文件

10.1145/3411764.3445347

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{24941ef0b14844ecb6b8a167a0b72d52,

title = "Toward automatic audio description generation for accessible videos",

abstract = "Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.",

keywords = "Audio description, Audio-visual consistency, Video captioning, sentence-level embedding, accessibility, Video description",

author = "Yujia Wang and Wei Liang",

note = "Publisher Copyright: {\textcopyright} 2021 ACM.; 10th International Conference on Materials Processing and Characterisation, ICMPC 2020 ; Conference date: 21-02-2020 Through 23-02-2020",

year = "2021",

month = may,

day = "6",

doi = "10.1145/3411764.3445347",

language = "English",

series = "Conference on Human Factors in Computing Systems - Proceedings",

publisher = "Association for Computing Machinery",

booktitle = "CHI 2021 - Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems",

}

Wang, Y & Liang, W 2021, Toward automatic audio description generation for accessible videos. 在 CHI 2021 - Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths. Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, 10th International Conference on Materials Processing and Characterisation, ICMPC 2020, Mathura, U.P., 印度, 21/02/20. https://doi.org/10.1145/3411764.3445347

Toward automatic audio description generation for accessible videos. / Wang, Yujia; Liang, Wei.
CHI 2021 - Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths. Association for Computing Machinery, 2021. (Conference on Human Factors in Computing Systems - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Toward automatic audio description generation for accessible videos

AU - Wang, Yujia

AU - Liang, Wei

PY - 2021/5/6

Y1 - 2021/5/6

N2 - Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.

AB - Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating highquality audio descriptions requires a lot of manual description generation [50]. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.

KW - Audio description

KW - Audio-visual consistency

KW - Video captioning, sentence-level embedding, accessibility

KW - Video description

UR - http://www.scopus.com/inward/record.url?scp=85106756267&partnerID=8YFLogxK

U2 - 10.1145/3411764.3445347

DO - 10.1145/3411764.3445347

M3 - Conference contribution

AN - SCOPUS:85106756267

T3 - Conference on Human Factors in Computing Systems - Proceedings

BT - CHI 2021 - Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

PB - Association for Computing Machinery

T2 - 10th International Conference on Materials Processing and Characterisation, ICMPC 2020

Y2 - 21 February 2020 through 23 February 2020

ER -

Toward automatic audio description generation for accessible videos

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此