A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion

Qian Liu; Heyan Huang; Junyu Xuan; Guangquan Zhang; Yang Gao; Jie Lu

doi:10.1109/TFUZZ.2020.2993702

A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion

Qian Liu, Heyan Huang, Junyu Xuan, Guangquan Zhang, Yang Gao, Jie Lu^*

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

16 引用（Scopus）

摘要

Top-$ k$ words selection is a technique used to detect and return the $ k$ most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-$k$ words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines.

源语言	英语
文章编号	9091826
页（从-至）	2132-2144
页数	13
期刊	IEEE Transactions on Fuzzy Systems
卷	29
期	8
DOI	https://doi.org/10.1109/TFUZZ.2020.2993702
出版状态	已出版 - 8月 2021

访问文件

10.1109/TFUZZ.2020.2993702

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{efc1d6c7bf8a4fc99f99e28b7e67f7b9,

title = "A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion",

abstract = "Top-$ k$ words selection is a technique used to detect and return the $ k$ most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-$k$ words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines.",

keywords = "Fuzzy logic, Fuzzy machine learning, Natural language processing, Word embedding",

author = "Qian Liu and Heyan Huang and Junyu Xuan and Guangquan Zhang and Yang Gao and Jie Lu",

note = "Publisher Copyright: {\textcopyright} 1993-2012 IEEE.",

year = "2021",

month = aug,

doi = "10.1109/TFUZZ.2020.2993702",

language = "English",

volume = "29",

pages = "2132--2144",

journal = "IEEE Transactions on Fuzzy Systems",

issn = "1063-6706",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "8",

}

TY - JOUR

T1 - A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion

AU - Liu, Qian

AU - Huang, Heyan

AU - Xuan, Junyu

AU - Zhang, Guangquan

AU - Gao, Yang

AU - Lu, Jie

PY - 2021/8

Y1 - 2021/8

N2 - Top-$ k$ words selection is a technique used to detect and return the $ k$ most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-$k$ words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines.

AB - Top-$ k$ words selection is a technique used to detect and return the $ k$ most similar words to a given word from a candidate set. This is a crucial and widely used tool in various tasks. The key issue in top-$k$ words selection is how to measure the similarity between words. One popular and effective solution is to use a word embedding-based similarity measure, which represents words as low-dimensional vectors and measures the similarities between words according to the similarity of the vectors, using a metric. However, most word embedding methods only consider the local proximity properties of two words in a corpus. To mitigate this issue. In this article, we propose to use association rules for measuring word similarity at a global level, and a fuzzy similarity measure for top-k words selection that jointly encodes the local and the global similarities. Experiments on a real-world query task with three benchmark datasets, i.e., TREC-disk 4&5, WT10G, and RCV1, demonstrate the efficiency of the proposed method compared to several state-of-the-art baselines.

KW - Fuzzy logic

KW - Fuzzy machine learning

KW - Natural language processing

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=85112674971&partnerID=8YFLogxK

U2 - 10.1109/TFUZZ.2020.2993702

DO - 10.1109/TFUZZ.2020.2993702

M3 - Article

AN - SCOPUS:85112674971

SN - 1063-6706

VL - 29

SP - 2132

EP - 2144

JO - IEEE Transactions on Fuzzy Systems

JF - IEEE Transactions on Fuzzy Systems

IS - 8

M1 - 9091826

ER -

A Fuzzy Word Similarity Measure for Selecting Top-k Similar Words in Query Expansion

摘要

访问文件

其它文件与链接

指纹

引用此