TY - GEN
T1 - Outdated fact detection in knowledge bases
AU - Hao, Shuang
AU - Chai, Chengliang
AU - Li, Guoliang
AU - Tang, Nan
AU - Wang, Ning
AU - Yu, Xiang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - Knowledge bases (KBs), which store high-quality information, are crucial for many applications, such as enhancing search results and serving as external sources for data cleaning. Not surprisingly, there exist outdated facts in most KBs due to the rapid change of information. Naturally, it is important to keep KBs up-to-date. Traditional wisdom has investigated the problem of using reference data (such as new facts extracted from the news) to detect outdated facts in KBs. However, existing approaches can only cover a small percentage of facts in KBs. In this paper, we propose a novel human-in-the-loop approach for outdated fact detection in KBs. It trains a binary classifier using features such as historical update frequency and existence time of a fact to compute the likelihood of a fact in a KB to be outdated. Then, it interacts with humans to verify whether a fact with high likelihood is indeed outdated. In addition, it also uses logical rules to detect more outdated facts based on human feedback. The outdated facts detected by the logical rules will also be fed back to train the ML model further for data augmentation. Extensive experiments on real-world KBs, such as Yago and DBpedia, show the effectiveness of our solution.
AB - Knowledge bases (KBs), which store high-quality information, are crucial for many applications, such as enhancing search results and serving as external sources for data cleaning. Not surprisingly, there exist outdated facts in most KBs due to the rapid change of information. Naturally, it is important to keep KBs up-to-date. Traditional wisdom has investigated the problem of using reference data (such as new facts extracted from the news) to detect outdated facts in KBs. However, existing approaches can only cover a small percentage of facts in KBs. In this paper, we propose a novel human-in-the-loop approach for outdated fact detection in KBs. It trains a binary classifier using features such as historical update frequency and existence time of a fact to compute the likelihood of a fact in a KB to be outdated. Then, it interacts with humans to verify whether a fact with high likelihood is indeed outdated. In addition, it also uses logical rules to detect more outdated facts based on human feedback. The outdated facts detected by the logical rules will also be fed back to train the ML model further for data augmentation. Extensive experiments on real-world KBs, such as Yago and DBpedia, show the effectiveness of our solution.
UR - http://www.scopus.com/inward/record.url?scp=85085856350&partnerID=8YFLogxK
U2 - 10.1109/ICDE48307.2020.00196
DO - 10.1109/ICDE48307.2020.00196
M3 - Conference contribution
AN - SCOPUS:85085856350
T3 - Proceedings - International Conference on Data Engineering
SP - 1890
EP - 1893
BT - Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PB - IEEE Computer Society
T2 - 36th IEEE International Conference on Data Engineering, ICDE 2020
Y2 - 20 April 2020 through 24 April 2020
ER -