A dependency parser for spontaneous Chinese spoken language

Ruifang He; Yaru Wang; Dawei Song; Peng Zhang; Yuan Jia; Aijun Li

doi:10.1145/3196278

A dependency parser for spontaneous Chinese spoken language

Ruifang He^*, Yaru Wang, Dawei Song, Peng Zhang, Yuan Jia, Aijun Li

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

Dependency analysis is vital for spoken language understanding in spoken dialogue systems. However, existing research has mainly focused on western spoken languages, Japanese, and so on. Little research has been done for spoken Chinese in terms of dependency parsing. Therefore, the new spoken corpus, D-ESCSC (Dependency-Expressive Speech Corpus of Standard Chinese) is built by adding new dependency relations special to spoken Chinese based on a written Chinese annotation scheme. Since spoken Chinese contains typical ill-grammatical phenomena, e.g., translocation, repetition, duplication, and omission, the new atom feature related to punctuation and three feature templates are proposed to improve a graph-based dependency parser. Experimental results on spoken Chinese corpus show that the atom feature and three templates really work and the new parser outperforms the baseline parser. To our best knowledge, it is the first work to report dependency parsing results of spoken Chinese.

源语言	英语
文章编号	28
期刊	ACM Transactions on Asian and Low-Resource Language Information Processing
卷	17
期	4
DOI	https://doi.org/10.1145/3196278
出版状态	已出版 - 7月 2018

访问文件

10.1145/3196278

其它文件与链接

链接到 Scopus 的出版物

引用此

He, R., Wang, Y., Song, D., Zhang, P., Jia, Y., & Li, A. (2018). A dependency parser for spontaneous Chinese spoken language. ACM Transactions on Asian and Low-Resource Language Information Processing, 17(4), 文章 28. https://doi.org/10.1145/3196278

@article{5b01e6fa61f94e049ff701c77e958b3d,

title = "A dependency parser for spontaneous Chinese spoken language",

abstract = "Dependency analysis is vital for spoken language understanding in spoken dialogue systems. However, existing research has mainly focused on western spoken languages, Japanese, and so on. Little research has been done for spoken Chinese in terms of dependency parsing. Therefore, the new spoken corpus, D-ESCSC (Dependency-Expressive Speech Corpus of Standard Chinese) is built by adding new dependency relations special to spoken Chinese based on a written Chinese annotation scheme. Since spoken Chinese contains typical ill-grammatical phenomena, e.g., translocation, repetition, duplication, and omission, the new atom feature related to punctuation and three feature templates are proposed to improve a graph-based dependency parser. Experimental results on spoken Chinese corpus show that the atom feature and three templates really work and the new parser outperforms the baseline parser. To our best knowledge, it is the first work to report dependency parsing results of spoken Chinese.",

keywords = "Dependency parsing, Graph-based model, Spoken language, Spontaneous Chinese",

author = "Ruifang He and Yaru Wang and Dawei Song and Peng Zhang and Yuan Jia and Aijun Li",

note = "Publisher Copyright: {\textcopyright} 2018 ACM.",

year = "2018",

month = jul,

doi = "10.1145/3196278",

language = "English",

volume = "17",

journal = "ACM Transactions on Asian and Low-Resource Language Information Processing",

issn = "2375-4699",

publisher = "Association for Computing Machinery (ACM)",

number = "4",

}

TY - JOUR

T1 - A dependency parser for spontaneous Chinese spoken language

AU - He, Ruifang

AU - Wang, Yaru

AU - Song, Dawei

AU - Zhang, Peng

AU - Jia, Yuan

AU - Li, Aijun

PY - 2018/7

Y1 - 2018/7

N2 - Dependency analysis is vital for spoken language understanding in spoken dialogue systems. However, existing research has mainly focused on western spoken languages, Japanese, and so on. Little research has been done for spoken Chinese in terms of dependency parsing. Therefore, the new spoken corpus, D-ESCSC (Dependency-Expressive Speech Corpus of Standard Chinese) is built by adding new dependency relations special to spoken Chinese based on a written Chinese annotation scheme. Since spoken Chinese contains typical ill-grammatical phenomena, e.g., translocation, repetition, duplication, and omission, the new atom feature related to punctuation and three feature templates are proposed to improve a graph-based dependency parser. Experimental results on spoken Chinese corpus show that the atom feature and three templates really work and the new parser outperforms the baseline parser. To our best knowledge, it is the first work to report dependency parsing results of spoken Chinese.

AB - Dependency analysis is vital for spoken language understanding in spoken dialogue systems. However, existing research has mainly focused on western spoken languages, Japanese, and so on. Little research has been done for spoken Chinese in terms of dependency parsing. Therefore, the new spoken corpus, D-ESCSC (Dependency-Expressive Speech Corpus of Standard Chinese) is built by adding new dependency relations special to spoken Chinese based on a written Chinese annotation scheme. Since spoken Chinese contains typical ill-grammatical phenomena, e.g., translocation, repetition, duplication, and omission, the new atom feature related to punctuation and three feature templates are proposed to improve a graph-based dependency parser. Experimental results on spoken Chinese corpus show that the atom feature and three templates really work and the new parser outperforms the baseline parser. To our best knowledge, it is the first work to report dependency parsing results of spoken Chinese.

KW - Dependency parsing

KW - Graph-based model

KW - Spoken language

KW - Spontaneous Chinese

UR - http://www.scopus.com/inward/record.url?scp=85053373843&partnerID=8YFLogxK

U2 - 10.1145/3196278

DO - 10.1145/3196278

M3 - Article

AN - SCOPUS:85053373843

SN - 2375-4699

VL - 17

JO - ACM Transactions on Asian and Low-Resource Language Information Processing

JF - ACM Transactions on Asian and Low-Resource Language Information Processing

IS - 4

M1 - 28

ER -

A dependency parser for spontaneous Chinese spoken language

摘要

访问文件

其它文件与链接

指纹

引用此