Entity set expansion from Twitter

  • He Zhao*
  • , Chong Feng
  • , Zhunchen Luo
  • , Changhai Tian
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Online social media yields a large-scale corpora which is fairly informative and sometimes includes many up-to-date entities. The challenging task of expanding entity sets on social media text is to extract more uncommon entities only using several seeds already in hand. In this paper, we present an approach which is able to find novel entities by expanding a small initial seed set on Twitter text. Our method first generates candidate sets on the basis of the semantic similarity feature. Then it jointly utilizes 2 text-based features and other 12 ones which carry social media specific information. With the scores on those features, a ranking model is learned by a supervised algorithm to synthetically score each candidate terms and then the final ranked list is taken as the target expanded set. We do experiments with 24 entity classes on the Twitter corpus and in the expanded sets there come many novel entities which have not been completely detected in previous researches. And the experimental results on the datasets of different years can perfectly consist with the objective law that fresh entities change as time goes on.

Original languageEnglish
Title of host publicationICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages155-162
Number of pages8
ISBN (Electronic)9781450356565
DOIs
Publication statusPublished - 10 Sept 2018
Event8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2018 - Tianjin, China
Duration: 14 Sept 201817 Sept 2018

Publication series

NameICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval

Conference

Conference8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2018
Country/TerritoryChina
CityTianjin
Period14/09/1817/09/18

Keywords

  • Social media mining
  • entity set expansion
  • information extraction

Fingerprint

Dive into the research topics of 'Entity set expansion from Twitter'. Together they form a unique fingerprint.

Cite this