TY - JOUR
T1 - TigCLaF
T2 - a cross-lingual large language model framework for sentiment-aware text classification in low-resource tigrigna
AU - Gebremeskel, Hagos Gebremedhin
AU - Feng, Chong
AU - Abera, Asefa Mebrahtu
AU - Binegde, Geleta Negasa
N1 - Publisher Copyright:
© The Author(s) 2026.
PY - 2026/12
Y1 - 2026/12
N2 - This paper introduces TigCLaF, a novel cross-lingual large language model framework for sentiment-aware text classification in low-resource Tigrigna. The framework integrates tokenizer extension with adaptation, continual pretraining on unlabeled Tigrigna data, and LoRA-based parameter-efficient fine-tuning to enable effective cross-lingual adaptation from high-resource languages. Leveraging recent advances in multilingual pre-trained language models and large language models, we investigate zero-shot, few-shot, and full fine-tuning strategies for sentiment detection, incorporating transformer models XLM-RoBERTa and AfriBERTa, as well as instruction-tuned LLaMA models. Our approach integrates a Tigrigna sentiment lexicon into transformer-based embeddings via feature fusion, thereby enhancing preservation of the sentiment signal during cross-lingual transfer. The proposed framework is evaluated on a newly curated dataset of 30,000 Tigrigna instances, supported by auxiliary English and Amharic sentiment corpora for transfer learning. Experimental results show that sentiment-aware feature integration improves classification accuracy and Macro-F1 up to 7% over baseline multilingual models without sentiment augmentation. Furthermore, parameter-efficient fine-tuning LoRA achieved competitive accuracy while reducing model size and inference latency, making it suitable for computationally constrained settings. The systematic error analysis highlights the roles of script-specific preprocessing, idiomatic expressions, and nuances of cultural sentiment in classification performance. The proposed framework demonstrates the viability of combining LLM-based cross-lingual transfer with masked sentiment-aware enhancements for practical, resource-efficient NLP applications in low-resource language contexts.
AB - This paper introduces TigCLaF, a novel cross-lingual large language model framework for sentiment-aware text classification in low-resource Tigrigna. The framework integrates tokenizer extension with adaptation, continual pretraining on unlabeled Tigrigna data, and LoRA-based parameter-efficient fine-tuning to enable effective cross-lingual adaptation from high-resource languages. Leveraging recent advances in multilingual pre-trained language models and large language models, we investigate zero-shot, few-shot, and full fine-tuning strategies for sentiment detection, incorporating transformer models XLM-RoBERTa and AfriBERTa, as well as instruction-tuned LLaMA models. Our approach integrates a Tigrigna sentiment lexicon into transformer-based embeddings via feature fusion, thereby enhancing preservation of the sentiment signal during cross-lingual transfer. The proposed framework is evaluated on a newly curated dataset of 30,000 Tigrigna instances, supported by auxiliary English and Amharic sentiment corpora for transfer learning. Experimental results show that sentiment-aware feature integration improves classification accuracy and Macro-F1 up to 7% over baseline multilingual models without sentiment augmentation. Furthermore, parameter-efficient fine-tuning LoRA achieved competitive accuracy while reducing model size and inference latency, making it suitable for computationally constrained settings. The systematic error analysis highlights the roles of script-specific preprocessing, idiomatic expressions, and nuances of cultural sentiment in classification performance. The proposed framework demonstrates the viability of combining LLM-based cross-lingual transfer with masked sentiment-aware enhancements for practical, resource-efficient NLP applications in low-resource language contexts.
KW - Cross-Lingual Representation Learning
KW - Low-Resource Languages
KW - Multilingual Adaptation
KW - Sentiment-Aware classification
KW - Tigrigna
UR - https://www.scopus.com/pages/publications/105036351742
U2 - 10.1038/s41598-026-42786-4
DO - 10.1038/s41598-026-42786-4
M3 - Article
C2 - 41807496
AN - SCOPUS:105036351742
SN - 2045-2322
VL - 16
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 12953
ER -