Skip to main navigation Skip to search Skip to main content

Context enhanced keyword extraction for sparse geo-entity relation from web texts

  • Li Yu
  • , Feng Lu*
  • , Xueying Zhang
  • , Xiliang Liu
  • *Corresponding author for this work
  • CAS - Institute of Geographical Sciences and Natural Resources Research
  • Nanjing Normal University
  • Fujian Collaborative Innovation Center for Big Data Applications in Governments

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Geo-entity relation recognition from rich texts requires robust and effective solutions on keyword extraction. Compared with supervised learning methods, unsupervised learning methods attract more attention for their capability to capture the dynamic feature variation in text and to discover additional relation types. The frequency-based methods of keyword extraction have been widely studied. However, it is difficult to be applied into geo-entity keyword extraction directly because of the sparse distribution of geo-entity relations in texts. Besides, there are few studies on Chinese keyword extraction. This paper proposes a context enhanced keyword extraction method. Firstly the contexts for geo-entities are enhanced to reduce the sparseness of terms. Secondly two well-known frequency-based statistical methods (i.e., DF and Entropy) are used to build a large-scale corpus automatically from the enhanced contexts. Thirdly the lexical features and their weights are statistically determined based on the corpus to enhance the distinction of the terms. Finally, all terms in the enhanced contexts are measured with the lexical features, and the most important terms are selected as the keywords of geo-entity pairs. Experiments are conducted with mass real Chinese web texts. Compared with DF and Entropy, the presented method improves the precision by 41% and 36% respectively in discovering the keywords with sparse distribution and generates additional 60% correct keywords for geo-entity relation recognition.

Original languageEnglish
Title of host publicationWeb Technologies and Applications - APWeb 2016 Workshops, WDMA, GAP, and SDMA, Proceedings
EditorsJia Zhu, Rong Zhang, Lijun Chang, Wenjie Zhang, Kuien Liu, Atsuyuki Morishima, Tom Z.J. Fu, Xiaoyan Yang, Zhiwei Zhang
PublisherSpringer Verlag
Pages253-264
Number of pages12
ISBN (Print)9783319458342
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event18th International Conference on Web Technologies and Applications, APWeb 2016 and Workshop on 2nd International Workshop on Web Data Mining and Applications, WDMA 2016 and 1st International Workshop on Graph Analytics and Query Processing, GAP 2016 and 1st International Workshop on Spatial-temporal Data Management and Analytics, SDMA 2016 - Suzhou, China
Duration: 23 Sept 201625 Sept 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9865 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Web Technologies and Applications, APWeb 2016 and Workshop on 2nd International Workshop on Web Data Mining and Applications, WDMA 2016 and 1st International Workshop on Graph Analytics and Query Processing, GAP 2016 and 1st International Workshop on Spatial-temporal Data Management and Analytics, SDMA 2016
Country/TerritoryChina
CitySuzhou
Period23/09/1625/09/16

Keywords

  • Context enhancement
  • Geo-entity relation
  • Geographical information retrieval
  • Keyword extraction
  • Text mining

Fingerprint

Dive into the research topics of 'Context enhanced keyword extraction for sparse geo-entity relation from web texts'. Together they form a unique fingerprint.

Cite this