TY - GEN
T1 - Data-Free Encoder Stealing Attack in Self-supervised Learning
AU - Zhang, Chuan
AU - Ren, Xuhao
AU - Liang, Haotian
AU - Fan, Qing
AU - Tang, Xiangyun
AU - Li, Chunhai
AU - Zhu, Liehuang
AU - Wang, Yajie
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Self-supervised learning technology has rapidly developed in making full use of unlabeled images, using large amounts of unlabeled data to pre-train encoders, which has led to the rise of Encoder as a Service (EaaS). The demands of large amounts of data and computing resources put pre-trained encoders at risk of stealing attacks, which is an easy way to acquire encoder functionality cheaply. Conventional attacks against encoders assume the adversary can possess a surrogate dataset with a distribution similar to that of the proprietary training data employed to train the target encoder. In practical terms, this assumption is impractical, as obtaining such a surrogate dataset is expensive and difficult. In this paper, we propose a novel data-free encoder stealing attack called DaES. Specifically, we introduce a generator training scheme to craft synthetic inputs used for minimizing the distance between the embeddings of the target encoder and surrogate encoder. This approach enables the surrogate encoder to mimic the behavior of the target encoder. Furthermore, we employ gradient estimation methods to overcome the challenge posed by limited black-box access to the target encoder, thereby improving the attack’s efficiency. Our experiments conducted across various encoders and datasets illustrate that our attack enhances state-of-the-art accuracy by up to 6.20%.
AB - Self-supervised learning technology has rapidly developed in making full use of unlabeled images, using large amounts of unlabeled data to pre-train encoders, which has led to the rise of Encoder as a Service (EaaS). The demands of large amounts of data and computing resources put pre-trained encoders at risk of stealing attacks, which is an easy way to acquire encoder functionality cheaply. Conventional attacks against encoders assume the adversary can possess a surrogate dataset with a distribution similar to that of the proprietary training data employed to train the target encoder. In practical terms, this assumption is impractical, as obtaining such a surrogate dataset is expensive and difficult. In this paper, we propose a novel data-free encoder stealing attack called DaES. Specifically, we introduce a generator training scheme to craft synthetic inputs used for minimizing the distance between the embeddings of the target encoder and surrogate encoder. This approach enables the surrogate encoder to mimic the behavior of the target encoder. Furthermore, we employ gradient estimation methods to overcome the challenge posed by limited black-box access to the target encoder, thereby improving the attack’s efficiency. Our experiments conducted across various encoders and datasets illustrate that our attack enhances state-of-the-art accuracy by up to 6.20%.
KW - Data-free
KW - Encoder as a Service
KW - Encoder Stealing Attacks
KW - Self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85218968154&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-1525-4_6
DO - 10.1007/978-981-96-1525-4_6
M3 - Conference contribution
AN - SCOPUS:85218968154
SN - 9789819615247
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 100
EP - 120
BT - Algorithms and Architectures for Parallel Processing - 24th International Conference, ICA3PP 2024, Proceedings
A2 - Zhu, Tianqing
A2 - Li, Jin
A2 - Castiglione, Aniello
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024
Y2 - 29 October 2024 through 31 October 2024
ER -