Multi-level Chunk-based Constituent-To-Dependency Treebank Transformation for Tibetan Dependency Parsing

Shumin Shi, Dan Luo, Xing Wu, Congjun Long, Heyan Huang

Research output: Contribution to journalArticlepeer-review

Abstract

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-To-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.

Original languageEnglish
Article number26
JournalACM Transactions on Asian and Low-Resource Language Information Processing
Volume20
Issue number2
DOIs
Publication statusPublished - Apr 2021

Keywords

  • Knowledge-driven
  • Low-resource dependency parsing
  • Multi-level chunk mechanism
  • Tibetan dependency trees

Fingerprint

Dive into the research topics of 'Multi-level Chunk-based Constituent-To-Dependency Treebank Transformation for Tibetan Dependency Parsing'. Together they form a unique fingerprint.

Cite this