Distributed Robust Bandits With Efficient Communication

Ao Wang; Zhida Qin; Lu Zheng; Dapeng Li; Lin Gao

doi:10.1109/TNSE.2022.3231320

Distributed Robust Bandits With Efficient Communication

Ao Wang, Zhida Qin^*, Lu Zheng, Dapeng Li^*, Lin Gao

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.

Original language	English
Pages (from-to)	1586-1598
Number of pages	13
Journal	IEEE Transactions on Network Science and Engineering
Volume	10
Issue number	3
DOIs	https://doi.org/10.1109/TNSE.2022.3231320
Publication status	Published - 1 May 2023

Keywords

Adversarial corruptions
Cooperation
Distributed multi-agent bandit (DMAB)
Robust learning

Access to Document

10.1109/TNSE.2022.3231320

Cite this

Wang, A., Qin, Z., Zheng, L., Li, D., & Gao, L. (2023). Distributed Robust Bandits With Efficient Communication. IEEE Transactions on Network Science and Engineering, 10(3), 1586-1598. https://doi.org/10.1109/TNSE.2022.3231320

@article{0406be81fc83490aa69dfa828e477942,

title = "Distributed Robust Bandits With Efficient Communication",

abstract = "The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.",

keywords = "Adversarial corruptions, Cooperation, Distributed multi-agent bandit (DMAB), Robust learning",

author = "Ao Wang and Zhida Qin and Lu Zheng and Dapeng Li and Lin Gao",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2023",

month = may,

day = "1",

doi = "10.1109/TNSE.2022.3231320",

language = "English",

volume = "10",

pages = "1586--1598",

journal = "IEEE Transactions on Network Science and Engineering",

issn = "2327-4697",

publisher = "IEEE Computer Society",

number = "3",

}

TY - JOUR

T1 - Distributed Robust Bandits With Efficient Communication

AU - Wang, Ao

AU - Qin, Zhida

AU - Zheng, Lu

AU - Li, Dapeng

AU - Gao, Lin

PY - 2023/5/1

Y1 - 2023/5/1

N2 - The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.

AB - The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.

KW - Adversarial corruptions

KW - Cooperation

KW - Distributed multi-agent bandit (DMAB)

KW - Robust learning

UR - http://www.scopus.com/inward/record.url?scp=85146246067&partnerID=8YFLogxK

U2 - 10.1109/TNSE.2022.3231320

DO - 10.1109/TNSE.2022.3231320

M3 - Article

AN - SCOPUS:85146246067

SN - 2327-4697

VL - 10

SP - 1586

EP - 1598

JO - IEEE Transactions on Network Science and Engineering

JF - IEEE Transactions on Network Science and Engineering

IS - 3

ER -

Distributed Robust Bandits With Efficient Communication

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this