Active learning strategies for extracting phrase-level topics from scientific literature

Tao Yue*, Yu Li, Zhang Runjie

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

[Objective] This paper explores methods of extracting information from scientific literature with the help of active learning strategies, aiming to address the issue of lacking annotated corpus. [Methods] We constructed our new model based on three representative active learning strategies (MARGIN, NSE, MNLP) and one novel LWP strategy, as well as the neural network model (namely CNN-BiLSTM-CRF). Then, we extracted the task and method related information from texts with much fewer annotations. [Results] We examined our model with scientific articles with 10%~30% selectively annotated texts. The proposed model yielded the same results as those of models with 100% annotated texts. It significantly reduced the labor costs of corpus construction. [Limitations] The number of scientific articles in our sample corpus was small, which led to low precision issues. [Conclusions] The proposed model significantly reduces its reliance on the scale of annotated corpus. Compared with the existing active learning strategies, the MNLP yielded better results and normalizes the sentence length to improve the model’s stability. In the meantime, MARGIN performs well in the initial iteration to identify the low-value instances, while LWP is suitable for dataset with more semantic labels.

Original languageEnglish
Pages (from-to)134-143
Number of pages10
JournalData Analysis and Knowledge Discovery
Volume4
Issue number10
DOIs
Publication statusPublished - 2020
Externally publishedYes

Keywords

  • Active Learning
  • Information Extraction
  • Neural Network

Fingerprint

Dive into the research topics of 'Active learning strategies for extracting phrase-level topics from scientific literature'. Together they form a unique fingerprint.

Cite this