TY - GEN
T1 - Entity set expansion from Twitter
AU - Zhao, He
AU - Feng, Chong
AU - Luo, Zhunchen
AU - Tian, Changhai
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - Online social media yields a large-scale corpora which is fairly informative and sometimes includes many up-to-date entities. The challenging task of expanding entity sets on social media text is to extract more uncommon entities only using several seeds already in hand. In this paper, we present an approach which is able to find novel entities by expanding a small initial seed set on Twitter text. Our method first generates candidate sets on the basis of the semantic similarity feature. Then it jointly utilizes 2 text-based features and other 12 ones which carry social media specific information. With the scores on those features, a ranking model is learned by a supervised algorithm to synthetically score each candidate terms and then the final ranked list is taken as the target expanded set. We do experiments with 24 entity classes on the Twitter corpus and in the expanded sets there come many novel entities which have not been completely detected in previous researches. And the experimental results on the datasets of different years can perfectly consist with the objective law that fresh entities change as time goes on.
AB - Online social media yields a large-scale corpora which is fairly informative and sometimes includes many up-to-date entities. The challenging task of expanding entity sets on social media text is to extract more uncommon entities only using several seeds already in hand. In this paper, we present an approach which is able to find novel entities by expanding a small initial seed set on Twitter text. Our method first generates candidate sets on the basis of the semantic similarity feature. Then it jointly utilizes 2 text-based features and other 12 ones which carry social media specific information. With the scores on those features, a ranking model is learned by a supervised algorithm to synthetically score each candidate terms and then the final ranked list is taken as the target expanded set. We do experiments with 24 entity classes on the Twitter corpus and in the expanded sets there come many novel entities which have not been completely detected in previous researches. And the experimental results on the datasets of different years can perfectly consist with the objective law that fresh entities change as time goes on.
KW - Social media mining
KW - entity set expansion
KW - information extraction
UR - https://www.scopus.com/pages/publications/85063424768
U2 - 10.1145/3234944.3234966
DO - 10.1145/3234944.3234966
M3 - Conference contribution
AN - SCOPUS:85063424768
T3 - ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval
SP - 155
EP - 162
BT - ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2018
Y2 - 14 September 2018 through 17 September 2018
ER -