TY - JOUR
T1 - A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion
AU - Liu, Qian
AU - Huang, Heyan
AU - Xuan, Junyu
AU - Zhang, Guangquan
AU - Gao, Yang
AU - Lu, Jie
N1 - Publisher Copyright:
© 1993-2012 IEEE.
PY - 2021/8
Y1 - 2021/8
N2 - Top-$ k$ words selection is a technique used to detect and return the $ k$ most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-$k$ words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines.
AB - Top-$ k$ words selection is a technique used to detect and return the $ k$ most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-$k$ words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines.
KW - Fuzzy logic
KW - Fuzzy machine learning
KW - Natural language processing
KW - Word embedding
UR - http://www.scopus.com/inward/record.url?scp=85112674971&partnerID=8YFLogxK
U2 - 10.1109/TFUZZ.2020.2993702
DO - 10.1109/TFUZZ.2020.2993702
M3 - Article
AN - SCOPUS:85112674971
SN - 1063-6706
VL - 29
SP - 2132
EP - 2144
JO - IEEE Transactions on Fuzzy Systems
JF - IEEE Transactions on Fuzzy Systems
IS - 8
M1 - 9091826
ER -