TY - GEN
T1 - Neural chinese word segmentation as sequence to sequence translation
AU - Shi, Xuewen
AU - Huang, Heyan
AU - Jian, Ping
AU - Guo, Yuhang
AU - Wei, Xiaochi
AU - Tang, Yi Kun
N1 - Publisher Copyright:
© Springer Nature Singapore Pte Ltd. 2017.
PY - 2017
Y1 - 2017
N2 - Recently, Chinese word segmentation (CWS) methods using neural networks have made impressive progress. Most of them regard the CWS as a sequence labeling problem which construct models based on local features rather than considering global information of input sequence. In this paper, we cast the CWS as a sequence translation problem and propose a novel sequence-to-sequence CWS model with an attention-based encoder-decoder framework. The model captures the global information from the input and directly outputs the segmented sequence. It can also tackle other NLP tasks with CWS jointly in an end-to-end mode. Experiments on Weibo, PKU and MSRA benchmark datasets show that our approach has achieved competitive performances compared with state-of-the-art methods. Meanwhile, we successfully applied our proposed model to jointly learning CWS and Chinese spelling correction, which demonstrates its applicability of multi-task fusion.
AB - Recently, Chinese word segmentation (CWS) methods using neural networks have made impressive progress. Most of them regard the CWS as a sequence labeling problem which construct models based on local features rather than considering global information of input sequence. In this paper, we cast the CWS as a sequence translation problem and propose a novel sequence-to-sequence CWS model with an attention-based encoder-decoder framework. The model captures the global information from the input and directly outputs the segmented sequence. It can also tackle other NLP tasks with CWS jointly in an end-to-end mode. Experiments on Weibo, PKU and MSRA benchmark datasets show that our approach has achieved competitive performances compared with state-of-the-art methods. Meanwhile, we successfully applied our proposed model to jointly learning CWS and Chinese spelling correction, which demonstrates its applicability of multi-task fusion.
KW - Chinese spelling correction
KW - Chinese word segmentation
KW - Natural language processing
KW - Sequence-to-sequence
UR - http://www.scopus.com/inward/record.url?scp=85034256945&partnerID=8YFLogxK
U2 - 10.1007/978-981-10-6805-8_8
DO - 10.1007/978-981-10-6805-8_8
M3 - Conference contribution
AN - SCOPUS:85034256945
SN - 9789811068041
T3 - Communications in Computer and Information Science
SP - 91
EP - 103
BT - Social Media Processing - 6th National Conference, SMP 2017, Proceedings
A2 - Liu, Huan
A2 - Xie, Xing
A2 - Cheng, Xueqi
A2 - Shen, Huawei
A2 - Ma, Weiying
A2 - Feng, Shizheng
PB - Springer Verlag
T2 - 6th National Conference on Social Media Processing, SMP 2017
Y2 - 14 September 2017 through 17 September 2017
ER -