Subject-Level Membership Inference Attack via Data Augmentation and Model Discrepancy

Yimin Liu; Peng Jiang; Liehuang Zhu

doi:10.1109/TIFS.2023.3318950

Subject-Level Membership Inference Attack via Data Augmentation and Model Discrepancy

Yimin Liu, Peng Jiang^*, Liehuang Zhu

^*Corresponding author for this work

School of Cyberspace Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Federated learning (FL) models are vulnerable to membership inference attacks (MIAs), and the requirement of individual privacy motivates the protection of subjects where the individual data is distributed across multiple users in the cross-silo FL setting. In this paper, we propose a subject-level membership inference attack based on data augmentation and model discrepancy. It can effectively infer whether the data distribution of the target subject has been sampled and used for training by specific federated users, even if other users (also) may sample from the same subject and use it as part of their training set. Specifically, the adversary uses a generative adversarial network (GAN) to perform data augmentation on a small amount of priori federation-associated information known in advance. Subsequently, the adversary merges two different outputs from the global and tested user models using an optimal feature construction method. We simulate a controlled federation configuration and conduct extensive experiments on real datasets that include both image and categorical data. Results show that the area under the curve (AUC) is improved by 12.6% to 16.8% compared to the classical membership inference attack. This is at the expense of the test accuracy of the data augmented with GAN, which is at most 3.5% lower than the real test data. We also explore the degree of privacy leakage between overfitted models and well-generalized models in the cross-silo FL setting and conclude experimentally that the former is more likely to leak individual privacy with a subject-level degradation rate of up to 0.43. Finally, we present two possible defense mechanisms to attenuate this newly discovered privacy risk.

Original language	English
Pages (from-to)	5848-5859
Number of pages	12
Journal	IEEE Transactions on Information Forensics and Security
Volume	18
DOIs	https://doi.org/10.1109/TIFS.2023.3318950
Publication status	Published - 2023

Keywords

Federated learning
generative adversarial networks
privacy degradation
subject-level membership inference attacks

Access to Document

10.1109/TIFS.2023.3318950

Cite this

@article{1e927f9817cf4e0e89b0366e71a28866,

title = "Subject-Level Membership Inference Attack via Data Augmentation and Model Discrepancy",

abstract = "Federated learning (FL) models are vulnerable to membership inference attacks (MIAs), and the requirement of individual privacy motivates the protection of subjects where the individual data is distributed across multiple users in the cross-silo FL setting. In this paper, we propose a subject-level membership inference attack based on data augmentation and model discrepancy. It can effectively infer whether the data distribution of the target subject has been sampled and used for training by specific federated users, even if other users (also) may sample from the same subject and use it as part of their training set. Specifically, the adversary uses a generative adversarial network (GAN) to perform data augmentation on a small amount of priori federation-associated information known in advance. Subsequently, the adversary merges two different outputs from the global and tested user models using an optimal feature construction method. We simulate a controlled federation configuration and conduct extensive experiments on real datasets that include both image and categorical data. Results show that the area under the curve (AUC) is improved by 12.6% to 16.8% compared to the classical membership inference attack. This is at the expense of the test accuracy of the data augmented with GAN, which is at most 3.5% lower than the real test data. We also explore the degree of privacy leakage between overfitted models and well-generalized models in the cross-silo FL setting and conclude experimentally that the former is more likely to leak individual privacy with a subject-level degradation rate of up to 0.43. Finally, we present two possible defense mechanisms to attenuate this newly discovered privacy risk.",

keywords = "Federated learning, generative adversarial networks, privacy degradation, subject-level membership inference attacks",

author = "Yimin Liu and Peng Jiang and Liehuang Zhu",

note = "Publisher Copyright: {\textcopyright} 2005-2012 IEEE.",

year = "2023",

doi = "10.1109/TIFS.2023.3318950",

language = "English",

volume = "18",

pages = "5848--5859",

journal = "IEEE Transactions on Information Forensics and Security",

issn = "1556-6013",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Subject-Level Membership Inference Attack via Data Augmentation and Model Discrepancy

AU - Liu, Yimin

AU - Jiang, Peng

AU - Zhu, Liehuang

PY - 2023

Y1 - 2023

N2 - Federated learning (FL) models are vulnerable to membership inference attacks (MIAs), and the requirement of individual privacy motivates the protection of subjects where the individual data is distributed across multiple users in the cross-silo FL setting. In this paper, we propose a subject-level membership inference attack based on data augmentation and model discrepancy. It can effectively infer whether the data distribution of the target subject has been sampled and used for training by specific federated users, even if other users (also) may sample from the same subject and use it as part of their training set. Specifically, the adversary uses a generative adversarial network (GAN) to perform data augmentation on a small amount of priori federation-associated information known in advance. Subsequently, the adversary merges two different outputs from the global and tested user models using an optimal feature construction method. We simulate a controlled federation configuration and conduct extensive experiments on real datasets that include both image and categorical data. Results show that the area under the curve (AUC) is improved by 12.6% to 16.8% compared to the classical membership inference attack. This is at the expense of the test accuracy of the data augmented with GAN, which is at most 3.5% lower than the real test data. We also explore the degree of privacy leakage between overfitted models and well-generalized models in the cross-silo FL setting and conclude experimentally that the former is more likely to leak individual privacy with a subject-level degradation rate of up to 0.43. Finally, we present two possible defense mechanisms to attenuate this newly discovered privacy risk.

AB - Federated learning (FL) models are vulnerable to membership inference attacks (MIAs), and the requirement of individual privacy motivates the protection of subjects where the individual data is distributed across multiple users in the cross-silo FL setting. In this paper, we propose a subject-level membership inference attack based on data augmentation and model discrepancy. It can effectively infer whether the data distribution of the target subject has been sampled and used for training by specific federated users, even if other users (also) may sample from the same subject and use it as part of their training set. Specifically, the adversary uses a generative adversarial network (GAN) to perform data augmentation on a small amount of priori federation-associated information known in advance. Subsequently, the adversary merges two different outputs from the global and tested user models using an optimal feature construction method. We simulate a controlled federation configuration and conduct extensive experiments on real datasets that include both image and categorical data. Results show that the area under the curve (AUC) is improved by 12.6% to 16.8% compared to the classical membership inference attack. This is at the expense of the test accuracy of the data augmented with GAN, which is at most 3.5% lower than the real test data. We also explore the degree of privacy leakage between overfitted models and well-generalized models in the cross-silo FL setting and conclude experimentally that the former is more likely to leak individual privacy with a subject-level degradation rate of up to 0.43. Finally, we present two possible defense mechanisms to attenuate this newly discovered privacy risk.

KW - Federated learning

KW - generative adversarial networks

KW - privacy degradation

KW - subject-level membership inference attacks

UR - http://www.scopus.com/inward/record.url?scp=85173325212&partnerID=8YFLogxK

U2 - 10.1109/TIFS.2023.3318950

DO - 10.1109/TIFS.2023.3318950

M3 - Article

AN - SCOPUS:85173325212

SN - 1556-6013

VL - 18

SP - 5848

EP - 5859

JO - IEEE Transactions on Information Forensics and Security

JF - IEEE Transactions on Information Forensics and Security

ER -

Subject-Level Membership Inference Attack via Data Augmentation and Model Discrepancy

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this