TY - GEN
T1 - CrossETR
T2 - 41st IEEE International Conference on Data Engineering, ICDE 2025
AU - Yuan, Qin
AU - Wen, Zhenyu
AU - Qian, Jiaxu
AU - Yuan, Ye
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Entity matching (EM) aims to identify whether two entities from different data sources refer to the same real-world entity. Most existing cross-modal EM assume that images have simple scenes containing few objects, or do not fully consider the cross-modal knowledge associated with entities. To support more practical application scenarios such as multi-modal knowledge graph integration and visual question answering in data lakes, we introduce our problem of semantic-driven EM across graph and images in this paper. Current semantically matching solutions over cross-modal data face the obstacle of low training efficiency, since their time complexity quadratically grows with the number of entities. To alleviate this issue, we present a novel framework (namely CrossETR) that follows an exploration-then-refinement paradigm. Firstly, a candidate exploration policy is proposed to boost the training efficiency. It explores candidate pairs according to entity correlations and captures structural semantics by adaptive sampling the most informative neighborhood subgraphs. Secondly, the cross-modal entity representations are refined to break modality heterogeneity to support unsupervised matching prediction. Extensive experimental evaluations on three publicly available benchmarks demonstrate the superiority of CrossETR over state-of-the-art approaches in terms of effectiveness and efficiency. Furthermore, a case study highlights that our proposed semantic-driven EM is promising to improve the performance of downstream tasks such as multi-modal knowledge graph integration.
AB - Entity matching (EM) aims to identify whether two entities from different data sources refer to the same real-world entity. Most existing cross-modal EM assume that images have simple scenes containing few objects, or do not fully consider the cross-modal knowledge associated with entities. To support more practical application scenarios such as multi-modal knowledge graph integration and visual question answering in data lakes, we introduce our problem of semantic-driven EM across graph and images in this paper. Current semantically matching solutions over cross-modal data face the obstacle of low training efficiency, since their time complexity quadratically grows with the number of entities. To alleviate this issue, we present a novel framework (namely CrossETR) that follows an exploration-then-refinement paradigm. Firstly, a candidate exploration policy is proposed to boost the training efficiency. It explores candidate pairs according to entity correlations and captures structural semantics by adaptive sampling the most informative neighborhood subgraphs. Secondly, the cross-modal entity representations are refined to break modality heterogeneity to support unsupervised matching prediction. Extensive experimental evaluations on three publicly available benchmarks demonstrate the superiority of CrossETR over state-of-the-art approaches in terms of effectiveness and efficiency. Furthermore, a case study highlights that our proposed semantic-driven EM is promising to improve the performance of downstream tasks such as multi-modal knowledge graph integration.
KW - cross-modal entity matching
KW - data lake
KW - semantic-driven matching
UR - https://www.scopus.com/pages/publications/105015523033
U2 - 10.1109/ICDE65448.2025.00054
DO - 10.1109/ICDE65448.2025.00054
M3 - Conference contribution
AN - SCOPUS:105015523033
T3 - Proceedings - International Conference on Data Engineering
SP - 641
EP - 654
BT - Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PB - IEEE Computer Society
Y2 - 19 May 2025 through 23 May 2025
ER -