TY - JOUR
T1 - Keyword Query over Error-Tolerant Knowledge Bases
AU - Cheng, Yu Rong
AU - Yuan, Ye
AU - Li, Jia Yu
AU - Chen, Lei
AU - Wang, Guo Ren
N1 - Publisher Copyright:
© 2016, Springer Science+Business Media New York.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users’ satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users’ requirement better compared with the traditional r-clique definition, and our algorithms are efficient.
AB - With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users’ satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users’ requirement better compared with the traditional r-clique definition, and our algorithms are efficient.
KW - Keyword query
KW - error-tolerant knowledge base
KW - index
UR - http://www.scopus.com/inward/record.url?scp=84978174733&partnerID=8YFLogxK
U2 - 10.1007/s11390-016-1658-y
DO - 10.1007/s11390-016-1658-y
M3 - Article
AN - SCOPUS:84978174733
SN - 1000-9000
VL - 31
SP - 702
EP - 719
JO - Journal of Computer Science and Technology
JF - Journal of Computer Science and Technology
IS - 4
ER -