An Enhanced New Word Identification Approach Using Bilingual Alignment

Ziyan Yang, Huaping Zhang*, Jianyun Shang, Silamu Wushour

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Traditional new word detection focused on finding the positional distribution of new words on Chinese text, but rarely on other languages. It was also difficult to obtain semantic information or translations of these new words. This paper proposed NEWBA, an enhanced new word identification algorithm by using bilingual corpus alignment. It indicated that NEWBA performs better than the traditional unsupervised method. In addition, it can obtain bilingual word pairs, which was able to provide us with translations beyond detection. NEWBA can expand the scope of traditional new word detection and therefore obtain more valuable information from bilingual aligned corpora.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings
EditorsWei Lu, Shujian Huang, Yu Hong, Xiabing Zhou
PublisherSpringer Science and Business Media Deutschland GmbH
Pages92-104
Number of pages13
ISBN (Print)9783031171192
DOIs
Publication statusPublished - 2022
Event11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022 - Guilin, China
Duration: 24 Sept 202225 Sept 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13551 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022
Country/TerritoryChina
CityGuilin
Period24/09/2225/09/22

Keywords

  • Bilingual corpus alignment
  • Multilingual
  • New word detection

Cite this