TY - GEN
T1 - A bootstrapping algorithm for geo-entity relation extraction from online encyclopedia
AU - Yu, Li
AU - Lu, Feng
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/1/11
Y1 - 2016/1/11
N2 - Extracting spatial and semantic relations between two geo-entities from web texts, is one core problem of geographical information retrieval. The primary methods are pattern matching and supervised learning. Since the coverage of patterns is limited due to poor adaptability and supervised learning needs a large number of labeled data that expensive, both are hard to process the massive and diverse web texts. Inspired by frequency statistics, which is an important technique of unsupervised relation extraction, this paper puts forward a novel approach to automatically extracting geo-entity relations without much manual effort. Firstly, we translate relation extraction to a problem of keyword extraction, and analyze the characteristics of word (part-of-speech, position and distance to entity) by means of bootstrapping. Secondly, calculate the weight of each word in one sentence with a pair of geo-entities based on the statistic results of characteristics, and pick out the word with the maximum weight as the relation of one pair of geo-entities. Lastly, we construct relation instances to obtain the structured information. In the experiment, we used bootstrapping to evaluate the precision and recall based on popular Sina Travel and BaiduBaike in Chinese and compared with three frequency statistic approaches (Frequency, TF-IDF and PPMI). The presented method is argued has following advantages: (1) it can automatically explore the lexical features from natural language texts, which neither the domain expert knowledge nor large scale corpora need, and breaks the restriction of closed relation types. (2) Compared with three classical frequency statistics methods, the precision and recall are improved by 5% and 23% respectively.
AB - Extracting spatial and semantic relations between two geo-entities from web texts, is one core problem of geographical information retrieval. The primary methods are pattern matching and supervised learning. Since the coverage of patterns is limited due to poor adaptability and supervised learning needs a large number of labeled data that expensive, both are hard to process the massive and diverse web texts. Inspired by frequency statistics, which is an important technique of unsupervised relation extraction, this paper puts forward a novel approach to automatically extracting geo-entity relations without much manual effort. Firstly, we translate relation extraction to a problem of keyword extraction, and analyze the characteristics of word (part-of-speech, position and distance to entity) by means of bootstrapping. Secondly, calculate the weight of each word in one sentence with a pair of geo-entities based on the statistic results of characteristics, and pick out the word with the maximum weight as the relation of one pair of geo-entities. Lastly, we construct relation instances to obtain the structured information. In the experiment, we used bootstrapping to evaluate the precision and recall based on popular Sina Travel and BaiduBaike in Chinese and compared with three frequency statistic approaches (Frequency, TF-IDF and PPMI). The presented method is argued has following advantages: (1) it can automatically explore the lexical features from natural language texts, which neither the domain expert knowledge nor large scale corpora need, and breaks the restriction of closed relation types. (2) Compared with three classical frequency statistics methods, the precision and recall are improved by 5% and 23% respectively.
KW - bootstrapping
KW - geo-entity
KW - relation extraction
KW - text mining
KW - web texts
UR - http://www.scopus.com/inward/record.url?scp=84962446597&partnerID=8YFLogxK
U2 - 10.1109/GEOINFORMATICS.2015.7378569
DO - 10.1109/GEOINFORMATICS.2015.7378569
M3 - Conference contribution
AN - SCOPUS:84962446597
T3 - International Conference on Geoinformatics
BT - Proceedings - 23rd International Conference on Geoinformatics 2015, Geoinformatics 2015
A2 - Hu, Shixiong
A2 - Ye, Xinyue
PB - IEEE Computer Society
T2 - 23rd International Conference on Geoinformatics, Geoinformatics 2015
Y2 - 19 June 2015 through 21 June 2015
ER -