Neural chinese word segmentation as sequence to sequence translation

Xuewen Shi, Heyan Huang, Ping Jian*, Yuhang Guo, Xiaochi Wei, Yi Kun Tang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

Recently, Chinese word segmentation (CWS) methods using neural networks have made impressive progress. Most of them regard the CWS as a sequence labeling problem which construct models based on local features rather than considering global information of input sequence. In this paper, we cast the CWS as a sequence translation problem and propose a novel sequence-to-sequence CWS model with an attention-based encoder-decoder framework. The model captures the global information from the input and directly outputs the segmented sequence. It can also tackle other NLP tasks with CWS jointly in an end-to-end mode. Experiments on Weibo, PKU and MSRA benchmark datasets show that our approach has achieved competitive performances compared with state-of-the-art methods. Meanwhile, we successfully applied our proposed model to jointly learning CWS and Chinese spelling correction, which demonstrates its applicability of multi-task fusion.

Original languageEnglish
Title of host publicationSocial Media Processing - 6th National Conference, SMP 2017, Proceedings
EditorsHuan Liu, Xing Xie, Xueqi Cheng, Huawei Shen, Weiying Ma, Shizheng Feng
PublisherSpringer Verlag
Pages91-103
Number of pages13
ISBN (Print)9789811068041
DOIs
Publication statusPublished - 2017
Event6th National Conference on Social Media Processing, SMP 2017 - Beijing, China
Duration: 14 Sept 201717 Sept 2017

Publication series

NameCommunications in Computer and Information Science
Volume774
ISSN (Print)1865-0929

Conference

Conference6th National Conference on Social Media Processing, SMP 2017
Country/TerritoryChina
CityBeijing
Period14/09/1717/09/17

Keywords

  • Chinese spelling correction
  • Chinese word segmentation
  • Natural language processing
  • Sequence-to-sequence

Fingerprint

Dive into the research topics of 'Neural chinese word segmentation as sequence to sequence translation'. Together they form a unique fingerprint.

Cite this