TY - JOUR
T1 - Adaptive online event detection in news streams
AU - Hu, Linmei
AU - Zhang, Bin
AU - Hou, Lei
AU - Li, Juanzi
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2017/12/15
Y1 - 2017/12/15
N2 - Event detection aims to discover news documents that report on the same event and arrange them under the same group. With the explosive growth of online news, there is a need for event detection to facilitate better navigation for users in news spaces. Existing works usually represent documents based on TF-IDF scheme and use a clustering algorithm for event detection. However, traditional TF-IDF vector representation suffers problems of high dimension and sparse semantics. In addition, with more news documents coming, IDF need to be incrementally updated. In this paper, we present a novel document representation method based on word embeddings, which reduces the dimension and alleviates the sparse semantics compared to TF-IDF, and thus improves the efficiency and accuracy. Based on the document representation, we propose an adaptive online clustering method for online news event detection, which improves both the precision and recall by using time slicing and event merging respectively. The resulted events are further improved by an adaptive post-processing step which can automatically detect noisy events and further process them. Experiments on standard and real-world datasets show that our proposed adaptive online event detection method significantly improves the performance of event detection in terms of both efficiency and accuracy compared to state-of-the-art methods.
AB - Event detection aims to discover news documents that report on the same event and arrange them under the same group. With the explosive growth of online news, there is a need for event detection to facilitate better navigation for users in news spaces. Existing works usually represent documents based on TF-IDF scheme and use a clustering algorithm for event detection. However, traditional TF-IDF vector representation suffers problems of high dimension and sparse semantics. In addition, with more news documents coming, IDF need to be incrementally updated. In this paper, we present a novel document representation method based on word embeddings, which reduces the dimension and alleviates the sparse semantics compared to TF-IDF, and thus improves the efficiency and accuracy. Based on the document representation, we propose an adaptive online clustering method for online news event detection, which improves both the precision and recall by using time slicing and event merging respectively. The resulted events are further improved by an adaptive post-processing step which can automatically detect noisy events and further process them. Experiments on standard and real-world datasets show that our proposed adaptive online event detection method significantly improves the performance of event detection in terms of both efficiency and accuracy compared to state-of-the-art methods.
KW - Adaptive online clustering
KW - Event detection
KW - Word embedding
UR - http://www.scopus.com/inward/record.url?scp=85031315274&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2017.09.039
DO - 10.1016/j.knosys.2017.09.039
M3 - Article
AN - SCOPUS:85031315274
SN - 0950-7051
VL - 138
SP - 105
EP - 112
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
ER -