An Improved Topic Extraction Method Based on Word Frequency Information Entropy for Multilingual Topic Attentional Division

Yue Yuan, Huaping Zhang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the contemporary era of ubiquitous global information dissemination, a myriad of news articles are generated worldwide on a daily basis. The topics that capture the attention of different countries diverge due to variances in culture, values, and other influential factors. Analyzing these discrepancies in topic preferences across languages within specific timeframes holds paramount importance for comprehensively understanding and delineating the nuances of diverse national cultures. This paper proposes a novel statistical analysis methodology for extracting multi-language news topic keywords, leveraging the concept of word frequency information entropy. Our approach facilitates the identification of shared topics across different languages, as well as language-specific concerns, within extensive news datasets. Furthermore, we address a prevalent challenge encountered in existing topic modeling methodologies, namely output redundancy. Through the aggregation of synonymous terms, we effectively alleviate redundancy, thereby enhancing the quality of extracted topic keywords. Experimental evaluations are conducted on a meticulously collected multinational news dataset, wherein we assess the effectiveness of our approach in partitioning common and language-specific focus topics across multiple languages, while also quantifying the efficacy of redundancy elimination.

Original languageEnglish
Title of host publication2024 9th International Conference on Intelligent Computing and Signal Processing, ICSP 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages675-681
Number of pages7
ISBN (Electronic)9798350376548
DOIs
Publication statusPublished - 2024
Event9th International Conference on Intelligent Computing and Signal Processing, ICSP 2024 - Hybrid, Xi'an, China
Duration: 19 Apr 202421 Apr 2024

Publication series

Name2024 9th International Conference on Intelligent Computing and Signal Processing, ICSP 2024

Conference

Conference9th International Conference on Intelligent Computing and Signal Processing, ICSP 2024
Country/TerritoryChina
CityHybrid, Xi'an
Period19/04/2421/04/24

Keywords

  • BERTopic
  • Entropy
  • mT5
  • Topic Attentional division

Fingerprint

Dive into the research topics of 'An Improved Topic Extraction Method Based on Word Frequency Information Entropy for Multilingual Topic Attentional Division'. Together they form a unique fingerprint.

Cite this