TY - JOUR
T1 - Distributed Robust Bandits With Efficient Communication
AU - Wang, Ao
AU - Qin, Zhida
AU - Zheng, Lu
AU - Li, Dapeng
AU - Gao, Lin
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023/5/1
Y1 - 2023/5/1
N2 - The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.
AB - The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.
KW - Adversarial corruptions
KW - Cooperation
KW - Distributed multi-agent bandit (DMAB)
KW - Robust learning
UR - http://www.scopus.com/inward/record.url?scp=85146246067&partnerID=8YFLogxK
U2 - 10.1109/TNSE.2022.3231320
DO - 10.1109/TNSE.2022.3231320
M3 - Article
AN - SCOPUS:85146246067
SN - 2327-4697
VL - 10
SP - 1586
EP - 1598
JO - IEEE Transactions on Network Science and Engineering
JF - IEEE Transactions on Network Science and Engineering
IS - 3
ER -