A bootstrapping algorithm for geo-entity relation extraction from online encyclopedia

Li Yu, Feng Lu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Extracting spatial and semantic relations between two geo-entities from web texts, is one core problem of geographical information retrieval. The primary methods are pattern matching and supervised learning. Since the coverage of patterns is limited due to poor adaptability and supervised learning needs a large number of labeled data that expensive, both are hard to process the massive and diverse web texts. Inspired by frequency statistics, which is an important technique of unsupervised relation extraction, this paper puts forward a novel approach to automatically extracting geo-entity relations without much manual effort. Firstly, we translate relation extraction to a problem of keyword extraction, and analyze the characteristics of word (part-of-speech, position and distance to entity) by means of bootstrapping. Secondly, calculate the weight of each word in one sentence with a pair of geo-entities based on the statistic results of characteristics, and pick out the word with the maximum weight as the relation of one pair of geo-entities. Lastly, we construct relation instances to obtain the structured information. In the experiment, we used bootstrapping to evaluate the precision and recall based on popular Sina Travel and BaiduBaike in Chinese and compared with three frequency statistic approaches (Frequency, TF-IDF and PPMI). The presented method is argued has following advantages: (1) it can automatically explore the lexical features from natural language texts, which neither the domain expert knowledge nor large scale corpora need, and breaks the restriction of closed relation types. (2) Compared with three classical frequency statistics methods, the precision and recall are improved by 5% and 23% respectively.

Original languageEnglish
Title of host publicationProceedings - 23rd International Conference on Geoinformatics 2015, Geoinformatics 2015
EditorsShixiong Hu, Xinyue Ye
PublisherIEEE Computer Society
ISBN (Electronic)9781467376631
DOIs
Publication statusPublished - 11 Jan 2016
Externally publishedYes
Event23rd International Conference on Geoinformatics, Geoinformatics 2015 - Wuhan, China
Duration: 19 Jun 201521 Jun 2015

Publication series

NameInternational Conference on Geoinformatics
Volume2016-January
ISSN (Print)2161-024X
ISSN (Electronic)2161-0258

Conference

Conference23rd International Conference on Geoinformatics, Geoinformatics 2015
Country/TerritoryChina
CityWuhan
Period19/06/1521/06/15

Keywords

  • bootstrapping
  • geo-entity
  • relation extraction
  • text mining
  • web texts

Fingerprint

Dive into the research topics of 'A bootstrapping algorithm for geo-entity relation extraction from online encyclopedia'. Together they form a unique fingerprint.

Cite this