TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification

Yuxiang Zhou; Lejian Liao; Yang Gao; Rui Wang; Heyan Huang

doi:10.1109/TNNLS.2021.3094987

TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification

Yuxiang Zhou, Lejian Liao, Yang Gao^*, Rui Wang, Heyan Huang

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

23 引用（Scopus）

摘要

Sentiment classification is a form of data analytics where people's feelings and attitudes toward a topic are mined from data. This tantalizing power to 'predict the zeitgeist' means that sentiment classification has long attracted interest, but with mixed results. However, the recent development of the BERT framework and its pretrained neural language models is seeing new-found success for sentiment classification. BERT models are trained to capture word-level information via mask language modeling and sentence-level contexts via next sentence prediction tasks. Out of the box, they are adequate models for some natural language processing tasks. However, most models are further fine-tuned with domain-specific information to increase accuracy and usefulness. Motivated by the idea that a further fine-tuning step would improve the performance for downstream sentiment classification tasks, we developed TopicBERT - a BERT model fine-tuned to recognize topics at the corpus level in addition to the word and sentence levels. TopicBERT comprises two variants: TopicBERT-ATP (aspect topic prediction), which captures topic information via an auxiliary training task, and TopicBERT-TA, where topic representation is directly injected into a topic augmentation layer for sentiment classification. With TopicBERT-ATP, the topics are predetermined by an LDA mechanism and collapsed Gibbs sampling. With TopicBERT-TA, the topics can change dynamically during the training. Experimental results show that both approaches deliver the state-of-the-art performance in two different domains with SemEval 2014 Task 4. However, in a test of methods, direct augmentation outperforms further training. Comprehensive analyses in the form of ablation, parameter, and complexity studies accompany the results.

源语言	英语
页（从-至）	380-393
页数	14
期刊	IEEE Transactions on Neural Networks and Learning Systems
卷	34
期	1
DOI	https://doi.org/10.1109/TNNLS.2021.3094987
出版状态	已出版 - 1 1月 2023

访问文件

10.1109/TNNLS.2021.3094987

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhou, Y., Liao, L., Gao, Y., Wang, R., & Huang, H. (2023). TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification. IEEE Transactions on Neural Networks and Learning Systems, 34(1), 380-393. https://doi.org/10.1109/TNNLS.2021.3094987

@article{bafe4d1f5d284a97ace7e20ce4cb9097,

title = "TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification",

abstract = "Sentiment classification is a form of data analytics where people's feelings and attitudes toward a topic are mined from data. This tantalizing power to 'predict the zeitgeist' means that sentiment classification has long attracted interest, but with mixed results. However, the recent development of the BERT framework and its pretrained neural language models is seeing new-found success for sentiment classification. BERT models are trained to capture word-level information via mask language modeling and sentence-level contexts via next sentence prediction tasks. Out of the box, they are adequate models for some natural language processing tasks. However, most models are further fine-tuned with domain-specific information to increase accuracy and usefulness. Motivated by the idea that a further fine-tuning step would improve the performance for downstream sentiment classification tasks, we developed TopicBERT - a BERT model fine-tuned to recognize topics at the corpus level in addition to the word and sentence levels. TopicBERT comprises two variants: TopicBERT-ATP (aspect topic prediction), which captures topic information via an auxiliary training task, and TopicBERT-TA, where topic representation is directly injected into a topic augmentation layer for sentiment classification. With TopicBERT-ATP, the topics are predetermined by an LDA mechanism and collapsed Gibbs sampling. With TopicBERT-TA, the topics can change dynamically during the training. Experimental results show that both approaches deliver the state-of-the-art performance in two different domains with SemEval 2014 Task 4. However, in a test of methods, direct augmentation outperforms further training. Comprehensive analyses in the form of ablation, parameter, and complexity studies accompany the results.",

keywords = "Bidirectional encoder representations from transformers (BERT), pretrained neural language model, sentiment classification, topic-enhanced neural network",

author = "Yuxiang Zhou and Lejian Liao and Yang Gao and Rui Wang and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2023",

month = jan,

day = "1",

doi = "10.1109/TNNLS.2021.3094987",

language = "English",

volume = "34",

pages = "380--393",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "1",

}

TY - JOUR

T1 - TopicBERT

T2 - A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification

AU - Zhou, Yuxiang

AU - Liao, Lejian

AU - Gao, Yang

AU - Wang, Rui

AU - Huang, Heyan

PY - 2023/1/1

Y1 - 2023/1/1

N2 - Sentiment classification is a form of data analytics where people's feelings and attitudes toward a topic are mined from data. This tantalizing power to 'predict the zeitgeist' means that sentiment classification has long attracted interest, but with mixed results. However, the recent development of the BERT framework and its pretrained neural language models is seeing new-found success for sentiment classification. BERT models are trained to capture word-level information via mask language modeling and sentence-level contexts via next sentence prediction tasks. Out of the box, they are adequate models for some natural language processing tasks. However, most models are further fine-tuned with domain-specific information to increase accuracy and usefulness. Motivated by the idea that a further fine-tuning step would improve the performance for downstream sentiment classification tasks, we developed TopicBERT - a BERT model fine-tuned to recognize topics at the corpus level in addition to the word and sentence levels. TopicBERT comprises two variants: TopicBERT-ATP (aspect topic prediction), which captures topic information via an auxiliary training task, and TopicBERT-TA, where topic representation is directly injected into a topic augmentation layer for sentiment classification. With TopicBERT-ATP, the topics are predetermined by an LDA mechanism and collapsed Gibbs sampling. With TopicBERT-TA, the topics can change dynamically during the training. Experimental results show that both approaches deliver the state-of-the-art performance in two different domains with SemEval 2014 Task 4. However, in a test of methods, direct augmentation outperforms further training. Comprehensive analyses in the form of ablation, parameter, and complexity studies accompany the results.

AB - Sentiment classification is a form of data analytics where people's feelings and attitudes toward a topic are mined from data. This tantalizing power to 'predict the zeitgeist' means that sentiment classification has long attracted interest, but with mixed results. However, the recent development of the BERT framework and its pretrained neural language models is seeing new-found success for sentiment classification. BERT models are trained to capture word-level information via mask language modeling and sentence-level contexts via next sentence prediction tasks. Out of the box, they are adequate models for some natural language processing tasks. However, most models are further fine-tuned with domain-specific information to increase accuracy and usefulness. Motivated by the idea that a further fine-tuning step would improve the performance for downstream sentiment classification tasks, we developed TopicBERT - a BERT model fine-tuned to recognize topics at the corpus level in addition to the word and sentence levels. TopicBERT comprises two variants: TopicBERT-ATP (aspect topic prediction), which captures topic information via an auxiliary training task, and TopicBERT-TA, where topic representation is directly injected into a topic augmentation layer for sentiment classification. With TopicBERT-ATP, the topics are predetermined by an LDA mechanism and collapsed Gibbs sampling. With TopicBERT-TA, the topics can change dynamically during the training. Experimental results show that both approaches deliver the state-of-the-art performance in two different domains with SemEval 2014 Task 4. However, in a test of methods, direct augmentation outperforms further training. Comprehensive analyses in the form of ablation, parameter, and complexity studies accompany the results.

KW - Bidirectional encoder representations from transformers (BERT)

KW - pretrained neural language model

KW - sentiment classification

KW - topic-enhanced neural network

UR - http://www.scopus.com/inward/record.url?scp=85112172672&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2021.3094987

DO - 10.1109/TNNLS.2021.3094987

M3 - Article

C2 - 34357867

AN - SCOPUS:85112172672

SN - 2162-237X

VL - 34

SP - 380

EP - 393

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 1

ER -

TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification

摘要

访问文件

其它文件与链接

指纹

引用此