Quantization Enabled Differential Privacy in Bandit Games With Cooperative Players

Yeming Lin, Kun Liu*, Ilai Bistritz, Qian Ma, Yuanqing Xia

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper addresses the bandit game problem subject to privacy leakage, where the cooperative players aim to learn the optimal action profile that minimizes the global cost. The players do not have closed-form expressions for their payoff functions and can only receive the feedback of their local costs. We propose a privacy-preserving distributed bandit learning algorithm based on the residual gradient estimator, which adopts the stochastic quantization with a binary randomized response scheme to mask action profile estimates before communication. The theoretical analysis demonstrates that our algorithm can achieve an expected regret order of O(T3/4) and preserve εdp-differential privacy for the players.

Original languageEnglish
JournalIEEE Transactions on Automatic Control
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Bandit games
  • cooperative optimization
  • privacy preservation
  • stochastic quantization

Fingerprint

Dive into the research topics of 'Quantization Enabled Differential Privacy in Bandit Games With Cooperative Players'. Together they form a unique fingerprint.

Cite this