Fast and Accurate Bilingual Lexicon Induction via Matching Optimization

Zewen Chi, Heyan Huang*, Shenjian Zhao, Heng Da Xu, Xian Ling Mao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most recent state-of-the-art approaches are proposed to utilize the pre-trained word embeddings for bilingual lexicon induction. However, the word embeddings introduce noises for both frequent and rare words. Especially in the case of rare words, embeddings of which are always not well learned due to their low occurrence in the training data. In order to alleviate the above problem, we propose BLIMO, a simple yet effective approach for automatic lexicon induction. It does not introduce word embeddings but converts the lexicon induction problem into a maximum weighted matching problem, which could be efficiently solved by the matching optimization with greedy search. Empirical experiments further demonstrate that our proposed method outperforms state-of-the-arts baselines greatly on two standard benchmarks.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Proceedings
EditorsJie Tang, Min-Yen Kan, Dongyan Zhao, Sujian Li, Hongying Zan
PublisherSpringer
Pages737-748
Number of pages12
ISBN (Print)9783030322328
DOIs
Publication statusPublished - 2019
Event8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019 - Dunhuang, China
Duration: 9 Oct 201914 Oct 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11838 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2019
Country/TerritoryChina
CityDunhuang
Period9/10/1914/10/19

Fingerprint

Dive into the research topics of 'Fast and Accurate Bilingual Lexicon Induction via Matching Optimization'. Together they form a unique fingerprint.

Cite this