TY - GEN
T1 - BIT-Event at NLPCC-2021 Task 3
T2 - 10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021
AU - Liu, Xiao
AU - Shi, Ge
AU - Wang, Bo
AU - Yuan, Changsen
AU - Huang, Heyan
AU - Feng, Chong
AU - Wu, Lifang
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - This paper describes the system proposed by the BIT-Event team for NLPCC 2021 shared task on Subevent Identification. The task includes two settings, and these settings face less reliable labeled data and the dilemma about selecting the most valid data to annotate, respectively. Without the luxury of training data, we propose a hybrid system based on semi-supervised algorithms to enhance the performance by effectively learning from a large amount of unlabeled corpus. In this hybrid model, we first fine-tune the pre-trained model to adapt it to the training data scenario. Besides, Adversarial Training and Virtual Adversarial Training are combined to enhance the effect of a single model with unlabeled in-domain data. The additional information is further captured via retraining using pseudo-labels. On the other hand, we apply Active Learning as an iterative process that starts from a small number of labeled seeding instances. The experimental results suggest that the semi-supervised methods fit the low-resource subevent identification problem well. Our best results were obtained by an ensemble of these methods. According to the official results, our approach proved the best for all the settings in this task.
AB - This paper describes the system proposed by the BIT-Event team for NLPCC 2021 shared task on Subevent Identification. The task includes two settings, and these settings face less reliable labeled data and the dilemma about selecting the most valid data to annotate, respectively. Without the luxury of training data, we propose a hybrid system based on semi-supervised algorithms to enhance the performance by effectively learning from a large amount of unlabeled corpus. In this hybrid model, we first fine-tune the pre-trained model to adapt it to the training data scenario. Besides, Adversarial Training and Virtual Adversarial Training are combined to enhance the effect of a single model with unlabeled in-domain data. The additional information is further captured via retraining using pseudo-labels. On the other hand, we apply Active Learning as an iterative process that starts from a small number of labeled seeding instances. The experimental results suggest that the semi-supervised methods fit the low-resource subevent identification problem well. Our best results were obtained by an ensemble of these methods. According to the official results, our approach proved the best for all the settings in this task.
KW - Active learning
KW - Adversarial training
KW - Semi-supervised
KW - Subevent identification
UR - http://www.scopus.com/inward/record.url?scp=85118170136&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-88483-3_32
DO - 10.1007/978-3-030-88483-3_32
M3 - Conference contribution
AN - SCOPUS:85118170136
SN - 9783030884826
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 400
EP - 411
BT - Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings
A2 - Wang, Lu
A2 - Feng, Yansong
A2 - Hong, Yu
A2 - He, Ruifang
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 13 October 2021 through 17 October 2021
ER -