TY - JOUR
T1 - EViT
T2 - Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing
AU - Feng, Qihua
AU - Li, Peiya
AU - Lu, Zhixun
AU - Li, Chaozhuo
AU - Wang, Zefan
AU - Liu, Zhiquan
AU - Duan, Chunhui
AU - Huang, Feiran
AU - Weng, Jian
AU - Yu, Philip S.
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Image retrieval systems help users to browse and search among extensive images in real time. With the rise of cloud computing, retrieval tasks are usually outsourced to cloud servers. However, the cloud scenario brings a daunting challenge of privacy protection as cloud servers cannot be fully trusted. To this end, image-encryption-based privacy-preserving image retrieval (PPIR) schemes have been developed, which first extract features from cipher-images, and then build retrieval models based on these features. Yet, most existing PPIR approaches extract shallow features and design trivial unsupervised retrieval models, resulting in insufficient expressiveness for the cipher-images. In this paper, we propose a novel paradigm named Encrypted Vision Transformer (EViT), which advances the discriminative representations capability of cipher-images. First, to capture comprehensive ruled information, we extract multi-level local length sequence and global Huffman-Code frequency features from the cipher-images which are encrypted by permutation encryption, sign encryption, and stream cipher during the JPEG compression process. Second, we design the modified self-supervised Vision Transformer with Huffman-embedding and propose two robust data augmentations on cipher-images to improve representation power of the retrieval model. Moreover, our proposal can be easily adapted to unsupervised or supervised settings. Extensive experiments reveal that EViT achieves both excellent encryption and retrieval performance, outperforming current schemes in terms of retrieval accuracy by large margins while protecting image privacy effectively. Code is publicly available at https://github.com/onlinehuazai/EViT.
AB - Image retrieval systems help users to browse and search among extensive images in real time. With the rise of cloud computing, retrieval tasks are usually outsourced to cloud servers. However, the cloud scenario brings a daunting challenge of privacy protection as cloud servers cannot be fully trusted. To this end, image-encryption-based privacy-preserving image retrieval (PPIR) schemes have been developed, which first extract features from cipher-images, and then build retrieval models based on these features. Yet, most existing PPIR approaches extract shallow features and design trivial unsupervised retrieval models, resulting in insufficient expressiveness for the cipher-images. In this paper, we propose a novel paradigm named Encrypted Vision Transformer (EViT), which advances the discriminative representations capability of cipher-images. First, to capture comprehensive ruled information, we extract multi-level local length sequence and global Huffman-Code frequency features from the cipher-images which are encrypted by permutation encryption, sign encryption, and stream cipher during the JPEG compression process. Second, we design the modified self-supervised Vision Transformer with Huffman-embedding and propose two robust data augmentations on cipher-images to improve representation power of the retrieval model. Moreover, our proposal can be easily adapted to unsupervised or supervised settings. Extensive experiments reveal that EViT achieves both excellent encryption and retrieval performance, outperforming current schemes in terms of retrieval accuracy by large margins while protecting image privacy effectively. Code is publicly available at https://github.com/onlinehuazai/EViT.
KW - Image retrieval
KW - JPEG
KW - privacy-preserving
KW - self-supervised learning
KW - vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85187015256&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2024.3370668
DO - 10.1109/TCSVT.2024.3370668
M3 - Article
AN - SCOPUS:85187015256
SN - 1051-8215
VL - 34
SP - 7467
EP - 7483
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 8
ER -