跳到主要导航 跳到搜索 跳到主要内容

CrossEM: A Prompt Tuning Framework for Cross-Modal Entity Matching

  • Beijing Institute of Technology
  • Zhejiang University of Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Entity matching (EM) aims to identify equivalent entities across different data sources. Current EM assumes that these data are either homogeneous with aligned schema or heterogeneous but can be transformed into a unified modality. There is an urgent need to consider the entities with different modalities to support practical application scenarios over data lakes such as multi-modal data integration and recommendation system. It is impractical to unify their data modalities. To support EM on heterogeneous entity with different data formats and modalities, we propose cross-modal entity matching in this paper. Inspired by the promising performance achieved by recent pre-trained models, we perform cross-modal entity matching by prompt-tuning pre-trained multi-modal large models (MMLMs) in an unsupervised manner. However, the prompt-tuning faces three challenging issues: (i) objective gap between pre-training and tuning of MMLMs; (ii) data modality gap between the inputs of MMLMs and our matching task; (iii) prompt efficiency on large data. Therefore, we firstly propose a novel EM framework (namely, CrossEM) that addresses cross-modal EM as a matching probability problem with specific prompt-tuning. Secondly, two alternative prompt generation methods are designed to extract structural knowledge from heterogeneous data to overcome the data modality gap with pre-trained models. Thirdly, we present an improved matching framework (namely, CrossEM+) to boost the prompt efficiency on large heterogeneous data. Experimental evaluations verify that our methods significantly outperform the state-of-the-art approaches on three benchmarks. Furthermore, our case study highlights the considerable potential of cross-modal EM in improving the performance of downstream tasks, thereby benefitting a wider range of research areas.

源语言英语
主期刊名Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
出版商IEEE Computer Society
627-640
页数14
ISBN(电子版)9798331536039
DOI
出版状态已出版 - 2025
活动41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, 中国
期限: 19 5月 202523 5月 2025

出版系列

姓名Proceedings - International Conference on Data Engineering
ISSN(印刷版)1084-4627
ISSN(电子版)2375-0286

会议

会议41st IEEE International Conference on Data Engineering, ICDE 2025
国家/地区中国
Hong Kong
时期19/05/2523/05/25

指纹

探究 'CrossEM: A Prompt Tuning Framework for Cross-Modal Entity Matching' 的科研主题。它们共同构成独一无二的指纹。

引用此