Deep atribute-preserving metric learning for natural language object retrieval

Jianan Li, Yunchao Wei, Xiaodan Liang, Fang Zhao, Jianshu Li, Tingfa Xu*, Jiashi Feng

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

21 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 21
  • Captures
    • Readers: 23
see details

摘要

Retrieving image content with a natural language expression is an emerging interdisciplinary problem at the intersection of multimedia, natural language processing and artificial intelligence. Existing methods tackle this challenging problem by learning features from the visual and linguistic domains independently while the critical semantic correlations bridging two domains have been under-explored in the feature learning process. In this paper, we propose to exploit sharable semantic attributes as "anchors" to ensure the learned features are well aligned across domains for better object retrieval. We define "attributes" as the common concepts that are informative for object retrieval and can be easily learned from both visual content and language expression. In particular, diverse and complex attributes (e.g., location, color, category, interaction between object and context) are modeled and incorporated to promote cross-domain alignment for feature learning from multiple perspectives. Based on the sharable attributes, we propose a deep Attribute-Preserving Metric learning (AP-Metric) framework that jointly generates unique query-sensitive region proposals and conducts novel cross-modal feature learning that explicitly pursues consistency over semantic attribute abstraction within both domains for deep metric learning. Benefiting from the cross-modal semantic correlations, our proposed framework can localize challenging visual objects to match complex query expressions within cluttered background accurately. The overall framework is end-to-end trainable. Extensive evaluations on popular datasets including ReferItGame [18], RefCOCO, and RefCOCO+ [43] well demonstrate its superiority. Notably, it achieves state-of-the-art performance on the challenging ReferItGame dataset.

源语言英语
主期刊名MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
出版商Association for Computing Machinery, Inc
181-189
页数9
ISBN(电子版)9781450349062
DOI
出版状态已出版 - 23 10月 2017
活动25th ACM International Conference on Multimedia, MM 2017 - Mountain View, 美国
期限: 23 10月 201727 10月 2017

出版系列

姓名MM 2017 - Proceedings of the 2017 ACM Multimedia Conference

会议

会议25th ACM International Conference on Multimedia, MM 2017
国家/地区美国
Mountain View
时期23/10/1727/10/17

指纹

探究 'Deep atribute-preserving metric learning for natural language object retrieval' 的科研主题。它们共同构成独一无二的指纹。

引用此

Li, J., Wei, Y., Liang, X., Zhao, F., Li, J., Xu, T., & Feng, J. (2017). Deep atribute-preserving metric learning for natural language object retrieval. 在 MM 2017 - Proceedings of the 2017 ACM Multimedia Conference (页码 181-189). (MM 2017 - Proceedings of the 2017 ACM Multimedia Conference). Association for Computing Machinery, Inc. https://doi.org/10.1145/3123266.3123439