Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

Meihuizi Jia; Xin Shen; Lei Shen; Jinhui Pang; Lejian Liao; Yang Song; Meng Chen; Xiaodong He

doi:10.1145/3503161.3548427

Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

Meihuizi Jia, Xin Shen, Lei Shen, Jinhui Pang^*, Lejian Liao, Yang Song, Meng Chen, Xiaodong He

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

36 引用（Scopus）

摘要

Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.

源语言	英语
主期刊名	MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
出版商	Association for Computing Machinery, Inc
页	3549-3558
页数	10
ISBN（电子版）	9781450392037
DOI	https://doi.org/10.1145/3503161.3548427
出版状态	已出版 - 10 10月 2022
已对外发布	是
活动	30th ACM International Conference on Multimedia, MM 2022 - Lisboa, 葡萄牙期限: 10 10月 2022 → 14 10月 2022

出版系列

姓名	MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

会议

会议	30th ACM International Conference on Multimedia, MM 2022
国家/地区	葡萄牙
市	Lisboa
时期	10/10/22 → 14/10/22

访问文件

10.1145/3503161.3548427

其它文件与链接

链接到 Scopus 的出版物

引用此

Jia, M., Shen, X., Shen, L., Pang, J., Liao, L., Song, Y., Chen, M., & He, X. (2022). Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition. 在 MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (页码 3549-3558). (MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia). Association for Computing Machinery, Inc. https://doi.org/10.1145/3503161.3548427

@inproceedings{5a99abdc92084f17b6f9044df58921f6,

title = "Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition",

abstract = "Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.",

keywords = "machine reading comprehension, multimodal named entity recognition, transfer learning, visual grounding",

author = "Meihuizi Jia and Xin Shen and Lei Shen and Jinhui Pang and Lejian Liao and Yang Song and Meng Chen and Xiaodong He",

note = "Publisher Copyright: {\textcopyright} 2022 ACM.; 30th ACM International Conference on Multimedia, MM 2022 ; Conference date: 10-10-2022 Through 14-10-2022",

year = "2022",

month = oct,

day = "10",

doi = "10.1145/3503161.3548427",

language = "English",

series = "MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia",

publisher = "Association for Computing Machinery, Inc",

pages = "3549--3558",

booktitle = "MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia",

}

Jia, M, Shen, X, Shen, L, Pang, J, Liao, L, Song, Y, Chen, M & He, X 2022, Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition. 在 MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia. MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 页码 3549-3558, 30th ACM International Conference on Multimedia, MM 2022, Lisboa, 葡萄牙, 10/10/22. https://doi.org/10.1145/3503161.3548427

Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition. / Jia, Meihuizi; Shen, Xin; Shen, Lei 等.
MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia. Association for Computing Machinery, Inc, 2022. 页码 3549-3558 (MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Query Prior Matters

T2 - 30th ACM International Conference on Multimedia, MM 2022

AU - Jia, Meihuizi

AU - Shen, Xin

AU - Shen, Lei

AU - Pang, Jinhui

AU - Liao, Lejian

AU - Song, Yang

AU - Chen, Meng

AU - He, Xiaodong

PY - 2022/10/10

Y1 - 2022/10/10

N2 - Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.

AB - Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.

KW - machine reading comprehension

KW - multimodal named entity recognition

KW - transfer learning

KW - visual grounding

UR - http://www.scopus.com/inward/record.url?scp=85143685415&partnerID=8YFLogxK

U2 - 10.1145/3503161.3548427

DO - 10.1145/3503161.3548427

M3 - Conference contribution

AN - SCOPUS:85143685415

T3 - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

SP - 3549

EP - 3558

BT - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

PB - Association for Computing Machinery, Inc

Y2 - 10 October 2022 through 14 October 2022

ER -

Jia M, Shen X, Shen L, Pang J, Liao L, Song Y 等. Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition. 在 MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia. Association for Computing Machinery, Inc. 2022. 页码 3549-3558. (MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia). doi: 10.1145/3503161.3548427

Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此