Distributed Robust Bandits With Efficient Communication

Ao Wang; Zhida Qin; Lu Zheng; Dapeng Li; Lin Gao

doi:10.1109/TNSE.2022.3231320

Distributed Robust Bandits With Efficient Communication

Ao Wang, Zhida Qin^*, Lu Zheng, Dapeng Li^*, Lin Gao

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.

源语言	英语
页（从-至）	1586-1598
页数	13
期刊	IEEE Transactions on Network Science and Engineering
卷	10
期	3
DOI	https://doi.org/10.1109/TNSE.2022.3231320
出版状态	已出版 - 1 5月 2023

访问文件

10.1109/TNSE.2022.3231320

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{0406be81fc83490aa69dfa828e477942,

title = "Distributed Robust Bandits With Efficient Communication",

abstract = "The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.",

keywords = "Adversarial corruptions, Cooperation, Distributed multi-agent bandit (DMAB), Robust learning",

author = "Ao Wang and Zhida Qin and Lu Zheng and Dapeng Li and Lin Gao",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2023",

month = may,

day = "1",

doi = "10.1109/TNSE.2022.3231320",

language = "English",

volume = "10",

pages = "1586--1598",

journal = "IEEE Transactions on Network Science and Engineering",

issn = "2327-4697",

publisher = "IEEE Computer Society",

number = "3",

}

TY - JOUR

T1 - Distributed Robust Bandits With Efficient Communication

AU - Wang, Ao

AU - Qin, Zhida

AU - Zheng, Lu

AU - Li, Dapeng

AU - Gao, Lin

PY - 2023/5/1

Y1 - 2023/5/1

N2 - The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.

AB - The Distributed Multi-Armed Bandit (DMAB) is a powerful framework for studying many network problems. The DMAB is typically studied in a paradigm, where signals activate each agent with a fixed probability, and the rewards revealed to agents are assumed to be generated from fixed and unknown distributions, i.e., stochastic rewards, or arbitrarily manipulated by an adversary, i.e., adversarial rewards. However, this paradigm fails to capture the dynamics and uncertainties of many real-world applications, where the signal that activates an agent, may not follow any distribution, and the rewards might be partially stochastic and partially adversarial. Motivated by this, we study the asynchronously stochastic DMAB problem with adversarial corruptions where the agent is activated arbitrarily, and rewards initially sampled from distributions might be corrupted by an adversary. The objectives are to simultaneously minimize the regret and communication cost, while robust to corruption. To address all these issues, we propose a Robust and Distributed Active Arm Elimination algorithm, namely RDAAE, which only needs to transmit one real number (e.g., an arm index, or a reward) per communication. We theoretically prove that the performance of regret and communication cost smoothly degrades when the corruption level increases.

KW - Adversarial corruptions

KW - Cooperation

KW - Distributed multi-agent bandit (DMAB)

KW - Robust learning

UR - http://www.scopus.com/inward/record.url?scp=85146246067&partnerID=8YFLogxK

U2 - 10.1109/TNSE.2022.3231320

DO - 10.1109/TNSE.2022.3231320

M3 - Article

AN - SCOPUS:85146246067

SN - 2327-4697

VL - 10

SP - 1586

EP - 1598

JO - IEEE Transactions on Network Science and Engineering

JF - IEEE Transactions on Network Science and Engineering

IS - 3

ER -

Distributed Robust Bandits With Efficient Communication

摘要

访问文件

其它文件与链接

指纹

引用此