Abstract
The terminologies in different disciplines vary greatly, and the annotated corpora are scarce, which have limited the portability of information extraction models. The content of scientific articles is still underutilized. This paper constructs an intelligent platform for information extraction and knowledge mining, namely IEKM-MD. Two innovative technologies are proposed: Firstly, a phrase-level scientific entity extraction model combining neural network and active learning is designed, which can reduce the model's dependence on large-scale corpus. Secondly, a translation-based relation prediction model is provided, which improves the relation embeddings by optimizing loss function. In addition, the platform integrates the advanced entity recognition model (spaCy.NER) and the keyword extraction model (RAKE). It provides abundant services for fine-grained and multi-dimensional knowledge, including problem discovery, method recognition, relation representation and hot spot detection. We carried out the experiments in three different domains: Artificial Intelligence, Nanotechnology and Genetic Engineering. The average accuracies of scientific entity extraction respectively are 0.91, 0.52 and 0.76.
Original language | English |
---|---|
Pages (from-to) | 73-78 |
Number of pages | 6 |
Journal | CEUR Workshop Proceedings |
Volume | 2658 |
Publication status | Published - 2020 |
Externally published | Yes |
Event | 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2020 - Virtual, Online, China Duration: 1 Aug 2020 → … |
Keywords
- Active learning
- Information extraction
- Neural network
- Relation prediction
- Translation embedding