Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation

Chen Chen; Jie Guo; Bin Song; Tong Zhang

doi:10.1145/3552468.3555362

Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation

Chen Chen, Jie Guo, Bin Song^*, Tong Zhang

^*此作品的通讯作者

Xidian University

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Interactive image retrieval for fashion outfit recommendation is a challenging task, which aims to search for the target desired image according to a multi-modal query (a reference image and a modification text). Previous studies focus on exploring effective feature composing methods to achieve similarity matching between different modalities. However, the existence of feature redundancy and the semantic inconsistency between modalities introduces many task-irrelevant information. It is intractable to correctly identify the particular information to be modified and will inevitably introduce noise disturbances which lead to suboptimal performance. To this end, we present a novel Orthogonal Vector-Decomposed Disentanglement Network (OVDDN) for image retrieval. It proposes to leverage the disentangled parts to learn a controllable denoising embedding space. First, we design an orthogonal disentanglement module. It is applied to both image and text features to decouple them into two independent components (invariant and specific) through orthogonal constraints. A similarity metric loss ensures semantic consistency of paired images. Then, an attention network generates composition of the reference image invariant part and text task-related part to match the target one. Finally, a differential feature alignment module maintain the cross-modal semantic consistency. Extensive experiments conducted on three benchmark datasets denote the OVDDN achieving the consistently superior performance. Ablation analyses further verify the effectiveness of our proposed model.

源语言	英语
主期刊名	MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation
出版商	Association for Computing Machinery, Inc
页	21-29
页数	9
ISBN（电子版）	9781450394987
DOI	https://doi.org/10.1145/3552468.3555362
出版状态	已出版 - 14 10月 2022
已对外发布	是
活动	1st Workshop on Multimedia Computing towards Fashion Recommendation, MCFR 2022 - Lisboa, 葡萄牙期限: 14 10月 2022 → …

出版系列

姓名	MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation

会议

会议	1st Workshop on Multimedia Computing towards Fashion Recommendation, MCFR 2022
国家/地区	葡萄牙
市	Lisboa
时期	14/10/22 → …

访问文件

10.1145/3552468.3555362

其它文件与链接

链接到 Scopus 的出版物

引用此

Chen, C., Guo, J., Song, B., & Zhang, T. (2022). Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation. 在 MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation (页码 21-29). (MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation). Association for Computing Machinery, Inc. https://doi.org/10.1145/3552468.3555362

Chen, Chen ; Guo, Jie ; Song, Bin 等. / Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation. MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation. Association for Computing Machinery, Inc, 2022. 页码 21-29 (MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation).

@inproceedings{b9d94006fb0d45b4af5c8fb58d7a7458,

title = "Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation",

abstract = "Interactive image retrieval for fashion outfit recommendation is a challenging task, which aims to search for the target desired image according to a multi-modal query (a reference image and a modification text). Previous studies focus on exploring effective feature composing methods to achieve similarity matching between different modalities. However, the existence of feature redundancy and the semantic inconsistency between modalities introduces many task-irrelevant information. It is intractable to correctly identify the particular information to be modified and will inevitably introduce noise disturbances which lead to suboptimal performance. To this end, we present a novel Orthogonal Vector-Decomposed Disentanglement Network (OVDDN) for image retrieval. It proposes to leverage the disentangled parts to learn a controllable denoising embedding space. First, we design an orthogonal disentanglement module. It is applied to both image and text features to decouple them into two independent components (invariant and specific) through orthogonal constraints. A similarity metric loss ensures semantic consistency of paired images. Then, an attention network generates composition of the reference image invariant part and text task-related part to match the target one. Finally, a differential feature alignment module maintain the cross-modal semantic consistency. Extensive experiments conducted on three benchmark datasets denote the OVDDN achieving the consistently superior performance. Ablation analyses further verify the effectiveness of our proposed model.",

keywords = "disentanglement learning, feature fusion, image retrieval",

author = "Chen Chen and Jie Guo and Bin Song and Tong Zhang",

note = "Publisher Copyright: {\textcopyright} 2022 ACM.; 1st Workshop on Multimedia Computing towards Fashion Recommendation, MCFR 2022 ; Conference date: 14-10-2022",

year = "2022",

month = oct,

day = "14",

doi = "10.1145/3552468.3555362",

language = "English",

series = "MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation",

publisher = "Association for Computing Machinery, Inc",

pages = "21--29",

booktitle = "MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation",

}

Chen, C, Guo, J, Song, B & Zhang, T 2022, Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation. 在 MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation. MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation, Association for Computing Machinery, Inc, 页码 21-29, 1st Workshop on Multimedia Computing towards Fashion Recommendation, MCFR 2022, Lisboa, 葡萄牙, 14/10/22. https://doi.org/10.1145/3552468.3555362

Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation. / Chen, Chen; Guo, Jie; Song, Bin 等.
MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation. Association for Computing Machinery, Inc, 2022. 页码 21-29 (MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation

AU - Chen, Chen

AU - Guo, Jie

AU - Song, Bin

AU - Zhang, Tong

PY - 2022/10/14

Y1 - 2022/10/14

N2 - Interactive image retrieval for fashion outfit recommendation is a challenging task, which aims to search for the target desired image according to a multi-modal query (a reference image and a modification text). Previous studies focus on exploring effective feature composing methods to achieve similarity matching between different modalities. However, the existence of feature redundancy and the semantic inconsistency between modalities introduces many task-irrelevant information. It is intractable to correctly identify the particular information to be modified and will inevitably introduce noise disturbances which lead to suboptimal performance. To this end, we present a novel Orthogonal Vector-Decomposed Disentanglement Network (OVDDN) for image retrieval. It proposes to leverage the disentangled parts to learn a controllable denoising embedding space. First, we design an orthogonal disentanglement module. It is applied to both image and text features to decouple them into two independent components (invariant and specific) through orthogonal constraints. A similarity metric loss ensures semantic consistency of paired images. Then, an attention network generates composition of the reference image invariant part and text task-related part to match the target one. Finally, a differential feature alignment module maintain the cross-modal semantic consistency. Extensive experiments conducted on three benchmark datasets denote the OVDDN achieving the consistently superior performance. Ablation analyses further verify the effectiveness of our proposed model.

AB - Interactive image retrieval for fashion outfit recommendation is a challenging task, which aims to search for the target desired image according to a multi-modal query (a reference image and a modification text). Previous studies focus on exploring effective feature composing methods to achieve similarity matching between different modalities. However, the existence of feature redundancy and the semantic inconsistency between modalities introduces many task-irrelevant information. It is intractable to correctly identify the particular information to be modified and will inevitably introduce noise disturbances which lead to suboptimal performance. To this end, we present a novel Orthogonal Vector-Decomposed Disentanglement Network (OVDDN) for image retrieval. It proposes to leverage the disentangled parts to learn a controllable denoising embedding space. First, we design an orthogonal disentanglement module. It is applied to both image and text features to decouple them into two independent components (invariant and specific) through orthogonal constraints. A similarity metric loss ensures semantic consistency of paired images. Then, an attention network generates composition of the reference image invariant part and text task-related part to match the target one. Finally, a differential feature alignment module maintain the cross-modal semantic consistency. Extensive experiments conducted on three benchmark datasets denote the OVDDN achieving the consistently superior performance. Ablation analyses further verify the effectiveness of our proposed model.

KW - disentanglement learning

KW - feature fusion

KW - image retrieval

UR - http://www.scopus.com/inward/record.url?scp=85141087359&partnerID=8YFLogxK

U2 - 10.1145/3552468.3555362

DO - 10.1145/3552468.3555362

M3 - Conference contribution

AN - SCOPUS:85141087359

T3 - MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation

SP - 21

EP - 29

BT - MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation

PB - Association for Computing Machinery, Inc

T2 - 1st Workshop on Multimedia Computing towards Fashion Recommendation, MCFR 2022

Y2 - 14 October 2022

ER -

Chen C, Guo J, Song B, Zhang T. Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation. 在 MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation. Association for Computing Machinery, Inc. 2022. 页码 21-29. (MCFR 2022 - Proceedings of the 1st Workshop on Multimedia Computing towards Fashion Recommendation). doi: 10.1145/3552468.3555362

Orthogonal Vector-Decomposed Disentanglement Network of Interactive Image Retrieval for Fashion Outfit Recommendation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此