Comic-guided speech synthesis

Yujia Wang; Wenguan Wang; Wei Liang; Lap Fai Yu

doi:10.1145/3355089.3356487

Comic-guided speech synthesis

Yujia Wang, Wenguan Wang, Wei Liang^*, Lap Fai Yu

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

17 引用（Scopus）

摘要

We introduce a novel approach for synthesizing realistic speeches for comics. Using a comic page as input, our approach synthesizes speeches for each comic character following the reading flow. It adopts a cascading strategy to synthesize speeches in two stages: Comic Visual Analysis and Comic Speech Synthesis. In the first stage, the input comic page is analyzed to identify the gender and age of the characters, as well as texts each character speaks and corresponding emotion. Guided by this analysis, in the second stage, our approach synthesizes realistic speeches for each character, which are consistent with the visual observations. Our experiments show that the proposed approach can synthesize realistic and lively speeches for different types of comics. Perceptual studies performed on the synthesis results of multiple sample comics validate the efficacy of our approach.

源语言	英语
文章编号	187
期刊	ACM Transactions on Graphics
卷	38
期	6
DOI	https://doi.org/10.1145/3355089.3356487
出版状态	已出版 - 11月 2019

访问文件

10.1145/3355089.3356487

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f7a5854db87d4682bfc7f83a6dbea424,

title = "Comic-guided speech synthesis",

abstract = "We introduce a novel approach for synthesizing realistic speeches for comics. Using a comic page as input, our approach synthesizes speeches for each comic character following the reading flow. It adopts a cascading strategy to synthesize speeches in two stages: Comic Visual Analysis and Comic Speech Synthesis. In the first stage, the input comic page is analyzed to identify the gender and age of the characters, as well as texts each character speaks and corresponding emotion. Guided by this analysis, in the second stage, our approach synthesizes realistic speeches for each character, which are consistent with the visual observations. Our experiments show that the proposed approach can synthesize realistic and lively speeches for different types of comics. Perceptual studies performed on the synthesis results of multiple sample comics validate the efficacy of our approach.",

keywords = "Comics, Deep learning, Speech synthesis",

author = "Yujia Wang and Wenguan Wang and Wei Liang and Yu, {Lap Fai}",

note = "Publisher Copyright: {\textcopyright} 2019 Association for Computing Machinery.",

year = "2019",

month = nov,

doi = "10.1145/3355089.3356487",

language = "English",

volume = "38",

journal = "ACM Transactions on Graphics",

issn = "0730-0301",

publisher = "Association for Computing Machinery (ACM)",

number = "6",

}

TY - JOUR

T1 - Comic-guided speech synthesis

AU - Wang, Yujia

AU - Wang, Wenguan

AU - Liang, Wei

AU - Yu, Lap Fai

PY - 2019/11

Y1 - 2019/11

N2 - We introduce a novel approach for synthesizing realistic speeches for comics. Using a comic page as input, our approach synthesizes speeches for each comic character following the reading flow. It adopts a cascading strategy to synthesize speeches in two stages: Comic Visual Analysis and Comic Speech Synthesis. In the first stage, the input comic page is analyzed to identify the gender and age of the characters, as well as texts each character speaks and corresponding emotion. Guided by this analysis, in the second stage, our approach synthesizes realistic speeches for each character, which are consistent with the visual observations. Our experiments show that the proposed approach can synthesize realistic and lively speeches for different types of comics. Perceptual studies performed on the synthesis results of multiple sample comics validate the efficacy of our approach.

AB - We introduce a novel approach for synthesizing realistic speeches for comics. Using a comic page as input, our approach synthesizes speeches for each comic character following the reading flow. It adopts a cascading strategy to synthesize speeches in two stages: Comic Visual Analysis and Comic Speech Synthesis. In the first stage, the input comic page is analyzed to identify the gender and age of the characters, as well as texts each character speaks and corresponding emotion. Guided by this analysis, in the second stage, our approach synthesizes realistic speeches for each character, which are consistent with the visual observations. Our experiments show that the proposed approach can synthesize realistic and lively speeches for different types of comics. Perceptual studies performed on the synthesis results of multiple sample comics validate the efficacy of our approach.

KW - Comics

KW - Deep learning

KW - Speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=85078882730&partnerID=8YFLogxK

U2 - 10.1145/3355089.3356487

DO - 10.1145/3355089.3356487

M3 - Article

AN - SCOPUS:85078882730

SN - 0730-0301

VL - 38

JO - ACM Transactions on Graphics

JF - ACM Transactions on Graphics

IS - 6

M1 - 187

ER -

Comic-guided speech synthesis

摘要

访问文件

其它文件与链接

指纹

引用此