TY - GEN
T1 - Extracting fine-grained entities based on coordinate graph
AU - Yang, Qing
AU - Jiang, Peng
AU - Zhang, Chunxia
AU - Niu, Zhendong
PY - 2013
Y1 - 2013
N2 - Most previous entity extraction studies focus on a small set of coarse-grained classes, such as person etc. However, the distribution of entities within query logs of search engine indicates that users are more interested in a wider range of fine-grained entities, such as GRAMMY winner and Ivy League member etc. In this paper, we present a semi-supervised method to extract fine-grained entities from an open-domain corpus. We build a graph based on entities in coordinate lists, which are html nodes with the same tag path of the DOM trees. Then class labels are propagated over the graph from known entities to unknowns. Experiments on a large corpus from ClueWeb09a dataset show that our proposed approach achieves the promising results.
AB - Most previous entity extraction studies focus on a small set of coarse-grained classes, such as person etc. However, the distribution of entities within query logs of search engine indicates that users are more interested in a wider range of fine-grained entities, such as GRAMMY winner and Ivy League member etc. In this paper, we present a semi-supervised method to extract fine-grained entities from an open-domain corpus. We build a graph based on entities in coordinate lists, which are html nodes with the same tag path of the DOM trees. Then class labels are propagated over the graph from known entities to unknowns. Experiments on a large corpus from ClueWeb09a dataset show that our proposed approach achieves the promising results.
KW - Coordinate Graph
KW - Fine-Grained Entity Extraction
KW - Label Propagation
UR - http://www.scopus.com/inward/record.url?scp=84884922783&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-38824-8_40
DO - 10.1007/978-3-642-38824-8_40
M3 - Conference contribution
AN - SCOPUS:84884922783
SN - 9783642388231
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 367
EP - 371
BT - Natural Language Processing and Information Systems - 18th International Conference on Applications of Natural Language to Information Systems, NLDB 2013, Proceedings
T2 - 18th International Conference on Application of Natural Language to Information Systems, NLDB 2013
Y2 - 19 June 2013 through 21 June 2013
ER -