Wasserstein coupled graph learning for cross-modal retrieval

Yun Wang; Tong Zhang; Xueya Zhang; Zhen Cui; Yuge Huang; Pengcheng Shen; Shaoxin Li; Jian Yang

doi:10.1109/ICCV48922.2021.00183

Wasserstein coupled graph learning for cross-modal retrieval

Yun Wang, Tong Zhang, Xueya Zhang, Zhen Cui^*, Yuge Huang, Pengcheng Shen, Shaoxin Li, Jian Yang

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

19 引用（Scopus）

摘要

Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of cross-modal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.

源语言	英语
主期刊名	Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
出版商	Institute of Electrical and Electronics Engineers Inc.
页	1793-1802
页数	10
ISBN（电子版）	9781665428125
DOI	https://doi.org/10.1109/ICCV48922.2021.00183
出版状态	已出版 - 2021
已对外发布	是
活动	18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, 加拿大期限: 11 10月 2021 → 17 10月 2021

出版系列

姓名	Proceedings of the IEEE International Conference on Computer Vision
ISSN（印刷版）	1550-5499

会议

会议	18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
国家/地区	加拿大
市	Virtual, Online
时期	11/10/21 → 17/10/21

访问文件

10.1109/ICCV48922.2021.00183

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, Y., Zhang, T., Zhang, X., Cui, Z., Huang, Y., Shen, P., Li, S., & Yang, J. (2021). Wasserstein coupled graph learning for cross-modal retrieval. 在 Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021 (页码 1793-1802). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV48922.2021.00183

@inproceedings{698abdca10cc47748fc1adffb5dfa8c3,

title = "Wasserstein coupled graph learning for cross-modal retrieval",

abstract = "Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of cross-modal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.",

author = "Yun Wang and Tong Zhang and Xueya Zhang and Zhen Cui and Yuge Huang and Pengcheng Shen and Shaoxin Li and Jian Yang",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE; 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 ; Conference date: 11-10-2021 Through 17-10-2021",

year = "2021",

doi = "10.1109/ICCV48922.2021.00183",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1793--1802",

booktitle = "Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021",

address = "United States",

}

Wang, Y, Zhang, T, Zhang, X, Cui, Z, Huang, Y, Shen, P, Li, S & Yang, J 2021, Wasserstein coupled graph learning for cross-modal retrieval. 在 Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., 页码 1793-1802, 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021, Virtual, Online, 加拿大, 11/10/21. https://doi.org/10.1109/ICCV48922.2021.00183

Wasserstein coupled graph learning for cross-modal retrieval. / Wang, Yun; Zhang, Tong; Zhang, Xueya 等.
Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Institute of Electrical and Electronics Engineers Inc., 2021. 页码 1793-1802 (Proceedings of the IEEE International Conference on Computer Vision).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Wasserstein coupled graph learning for cross-modal retrieval

AU - Wang, Yun

AU - Zhang, Tong

AU - Zhang, Xueya

AU - Cui, Zhen

AU - Huang, Yuge

AU - Shen, Pengcheng

AU - Li, Shaoxin

AU - Yang, Jian

PY - 2021

Y1 - 2021

N2 - Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of cross-modal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.

AB - Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of cross-modal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.

UR - http://www.scopus.com/inward/record.url?scp=85127741004&partnerID=8YFLogxK

U2 - 10.1109/ICCV48922.2021.00183

DO - 10.1109/ICCV48922.2021.00183

M3 - Conference contribution

AN - SCOPUS:85127741004

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 1793

EP - 1802

BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021

Y2 - 11 October 2021 through 17 October 2021

ER -

Wang Y, Zhang T, Zhang X, Cui Z, Huang Y, Shen P 等. Wasserstein coupled graph learning for cross-modal retrieval. 在 Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Institute of Electrical and Electronics Engineers Inc. 2021. 页码 1793-1802. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV48922.2021.00183

Wasserstein coupled graph learning for cross-modal retrieval

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此