TY - GEN
T1 - Deep Foreground-Background Weighted Cross-modal Hashing
AU - Zhao, Guanqi
AU - Mao, Xian Ling
AU - Tu, Rong Cheng
AU - Ji, Wenjin
AU - Huang, Heyan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - With the rapid growth of multi-modal data, deep cross-modal hashing algorithms provide a perfect solution for cross-modal retrieval tasks for their advantages of efficient retrieval speed and low storage consumption. Currently, the existing supervised cross-modal hashing methods, in order to efficiently extract structured information from raw data, generally gather on feature extraction of global information, however, all those methods ignore the weight differentiation between foreground and background information in a image. To address the issue, we propose a novel Deep Foreground-Background Weighted Cross-Modal Hashing(DFBWH) for supervised cross-modal retrieval. Specifically, the proposed method firstly performs target detection on the original image and select out candidate regions as target foreground entities. Then, the proposed method utilize the semantic interactions in the textual descriptions and tagging information as evaluation criteria, and use CLIP to detect the matching degree of the candidate regions. Eventually, under the supervision of the category labeling information, the hash loss function is utilized to obtain a high-quality hash code. Extensive experiments were carried out on two benchmark datasets, which demonstrate that DFBWH achieves better performance than the state-of-the-art baselines.
AB - With the rapid growth of multi-modal data, deep cross-modal hashing algorithms provide a perfect solution for cross-modal retrieval tasks for their advantages of efficient retrieval speed and low storage consumption. Currently, the existing supervised cross-modal hashing methods, in order to efficiently extract structured information from raw data, generally gather on feature extraction of global information, however, all those methods ignore the weight differentiation between foreground and background information in a image. To address the issue, we propose a novel Deep Foreground-Background Weighted Cross-Modal Hashing(DFBWH) for supervised cross-modal retrieval. Specifically, the proposed method firstly performs target detection on the original image and select out candidate regions as target foreground entities. Then, the proposed method utilize the semantic interactions in the textual descriptions and tagging information as evaluation criteria, and use CLIP to detect the matching degree of the candidate regions. Eventually, under the supervision of the category labeling information, the hash loss function is utilized to obtain a high-quality hash code. Extensive experiments were carried out on two benchmark datasets, which demonstrate that DFBWH achieves better performance than the state-of-the-art baselines.
KW - Cross-modal retrieval
KW - Deep hashing
KW - Foreground-Background Weighted
UR - http://www.scopus.com/inward/record.url?scp=85210072876&partnerID=8YFLogxK
U2 - 10.1007/978-981-97-9437-9_34
DO - 10.1007/978-981-97-9437-9_34
M3 - Conference contribution
AN - SCOPUS:85210072876
SN - 9789819794362
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 433
EP - 445
BT - Natural Language Processing and Chinese Computing - 13th National CCF Conference, NLPCC 2024, Proceedings
A2 - Wong, Derek F.
A2 - Wei, Zhongyu
A2 - Yang, Muyun
PB - Springer Science and Business Media Deutschland GmbH
T2 - 13th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2024
Y2 - 1 November 2024 through 3 November 2024
ER -