跳到主要导航 跳到搜索 跳到主要内容

TigCLaF: a cross-lingual large language model framework for sentiment-aware text classification in low-resource tigrigna

  • Hagos Gebremedhin Gebremeskel*
  • , Chong Feng*
  • , Asefa Mebrahtu Abera
  • , Geleta Negasa Binegde
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Mekelle University
  • Aksum University

科研成果: 期刊稿件文章同行评审

摘要

This paper introduces TigCLaF, a novel cross-lingual large language model framework for sentiment-aware text classification in low-resource Tigrigna. The framework integrates tokenizer extension with adaptation, continual pretraining on unlabeled Tigrigna data, and LoRA-based parameter-efficient fine-tuning to enable effective cross-lingual adaptation from high-resource languages. Leveraging recent advances in multilingual pre-trained language models and large language models, we investigate zero-shot, few-shot, and full fine-tuning strategies for sentiment detection, incorporating transformer models XLM-RoBERTa and AfriBERTa, as well as instruction-tuned LLaMA models. Our approach integrates a Tigrigna sentiment lexicon into transformer-based embeddings via feature fusion, thereby enhancing preservation of the sentiment signal during cross-lingual transfer. The proposed framework is evaluated on a newly curated dataset of 30,000 Tigrigna instances, supported by auxiliary English and Amharic sentiment corpora for transfer learning. Experimental results show that sentiment-aware feature integration improves classification accuracy and Macro-F1 up to 7% over baseline multilingual models without sentiment augmentation. Furthermore, parameter-efficient fine-tuning LoRA achieved competitive accuracy while reducing model size and inference latency, making it suitable for computationally constrained settings. The systematic error analysis highlights the roles of script-specific preprocessing, idiomatic expressions, and nuances of cultural sentiment in classification performance. The proposed framework demonstrates the viability of combining LLM-based cross-lingual transfer with masked sentiment-aware enhancements for practical, resource-efficient NLP applications in low-resource language contexts.

源语言英语
文章编号12953
期刊Scientific Reports
16
1
DOI
出版状态已出版 - 12月 2026
已对外发布

指纹

探究 'TigCLaF: a cross-lingual large language model framework for sentiment-aware text classification in low-resource tigrigna' 的科研主题。它们共同构成独一无二的指纹。

引用此