TY - GEN
T1 - Representation Learning for Entity Alignment in Knowledge Graph
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
AU - Huang, Peng
AU - Zhang, Meihui
AU - Zhong, Ziyue
AU - Chai, Chengliang
AU - Fan, Ju
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Entity alignment (EA) is a critical task in knowledge fusion, focusing on identifying equivalent entities in different knowledge graphs (KGs). As representation learning techniques have advanced, EA methods have achieved notable improvements on current EA datasets, and several benchmark studies have been conducted. However, we have identified two limitations with respect to existing benchmarks. (1) They perform coarse-grained evaluation, which analyzes each EA approach as a whole. However, a typical EA framework consists of multiple modules, each of which has different strategies. The combinations of these strategies may provide more optimization opportunities, which are unexplored in current studies. (2) Current EA datasets tested in existing studies always contain dense information. However, real-world applications are often with noisy and missing data, which introduces complexities for EA tasks. To address this, we propose a new benchmark that explores the design space of EA framework, which consists of the embedding, relation, attribute and alignment module. Each module has multiple strategies. We also synthesize multiple datasets based on real-world datasets to cover different complex scenarios. Based on the design space and various datasets, we aim to provide a general guideline that recommends the most effective strategy for EA under practical settings. We conduct extensive experiments via comparing 13 baseline methods over 4 real datasets and 12 synthesized datasets. Based on the experimental observations, we also propose a new EA method that outperforms existing baselines.
AB - Entity alignment (EA) is a critical task in knowledge fusion, focusing on identifying equivalent entities in different knowledge graphs (KGs). As representation learning techniques have advanced, EA methods have achieved notable improvements on current EA datasets, and several benchmark studies have been conducted. However, we have identified two limitations with respect to existing benchmarks. (1) They perform coarse-grained evaluation, which analyzes each EA approach as a whole. However, a typical EA framework consists of multiple modules, each of which has different strategies. The combinations of these strategies may provide more optimization opportunities, which are unexplored in current studies. (2) Current EA datasets tested in existing studies always contain dense information. However, real-world applications are often with noisy and missing data, which introduces complexities for EA tasks. To address this, we propose a new benchmark that explores the design space of EA framework, which consists of the embedding, relation, attribute and alignment module. Each module has multiple strategies. We also synthesize multiple datasets based on real-world datasets to cover different complex scenarios. Based on the design space and various datasets, we aim to provide a general guideline that recommends the most effective strategy for EA under practical settings. We conduct extensive experiments via comparing 13 baseline methods over 4 real datasets and 12 synthesized datasets. Based on the experimental observations, we also propose a new EA method that outperforms existing baselines.
UR - http://www.scopus.com/inward/record.url?scp=85200444930&partnerID=8YFLogxK
U2 - 10.1109/ICDE60146.2024.00267
DO - 10.1109/ICDE60146.2024.00267
M3 - Conference contribution
AN - SCOPUS:85200444930
T3 - Proceedings - International Conference on Data Engineering
SP - 3462
EP - 3475
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
Y2 - 13 May 2024 through 17 May 2024
ER -