TY - JOUR
T1 - Developing Lexicons for Enhanced Sentiment Analysis in Software Engineering
T2 - An Innovative Multilingual Approach for Social Media Reviews
AU - Khan, Zohaib Ahmad
AU - Xia, Yuanqing
AU - Khan, Ahmed
AU - Sadiq, Muhammad
AU - Alam, Mahmood
AU - Awwad, Fuad A.
AU - Ismail, Emad A.A.
N1 - Publisher Copyright:
© 2024 Tech Science Press. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Sentiment analysis is becoming increasingly important in today’s digital age, with social media being a significant source of user-generated content. The development of sentiment lexicons that can support languages other than English is a challenging task, especially for analyzing sentiment analysis in social media reviews. Most existing sentiment analysis systems focus on English, leaving a significant research gap in other languages due to limited resources and tools. This research aims to address this gap by building a sentiment lexicon for local languages, which is then used with a machine learning algorithm for efficient sentiment analysis. In the first step, a lexicon is developed that includes five languages: Urdu, Roman Urdu, Pashto, Roman Pashto, and English. The sentiment scores from SentiWordNet are associated with each word in the lexicon to produce an effective sentiment score. In the second step, a naive Bayesian algorithm is applied to the developed lexicon for efficient sentiment analysis of Roman Pashto. Both the sentiment lexicon and sentiment analysis steps were evaluated using information retrieval metrics, with an accuracy score of 0.89 for the sentiment lexicon and 0.83 for the sentiment analysis. The results showcase the potential for improving software engineering tasks related to user feedback analysis and product development.
AB - Sentiment analysis is becoming increasingly important in today’s digital age, with social media being a significant source of user-generated content. The development of sentiment lexicons that can support languages other than English is a challenging task, especially for analyzing sentiment analysis in social media reviews. Most existing sentiment analysis systems focus on English, leaving a significant research gap in other languages due to limited resources and tools. This research aims to address this gap by building a sentiment lexicon for local languages, which is then used with a machine learning algorithm for efficient sentiment analysis. In the first step, a lexicon is developed that includes five languages: Urdu, Roman Urdu, Pashto, Roman Pashto, and English. The sentiment scores from SentiWordNet are associated with each word in the lexicon to produce an effective sentiment score. In the second step, a naive Bayesian algorithm is applied to the developed lexicon for efficient sentiment analysis of Roman Pashto. Both the sentiment lexicon and sentiment analysis steps were evaluated using information retrieval metrics, with an accuracy score of 0.89 for the sentiment lexicon and 0.83 for the sentiment analysis. The results showcase the potential for improving software engineering tasks related to user feedback analysis and product development.
KW - Emotional assessment
KW - lexicons
KW - naive bayesian technique
KW - regional dialects
KW - SentiWordNet
KW - software engineering
KW - user feedback
UR - http://www.scopus.com/inward/record.url?scp=85193059038&partnerID=8YFLogxK
U2 - 10.32604/cmc.2024.046897
DO - 10.32604/cmc.2024.046897
M3 - Article
AN - SCOPUS:85193059038
SN - 1546-2218
VL - 79
SP - 2771
EP - 2793
JO - Computers, Materials and Continua
JF - Computers, Materials and Continua
IS - 2
ER -