跳到主要导航 跳到搜索 跳到主要内容

Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits

  • Zhida Qin
  • , Xiaoying Gan
  • , Jia Liu
  • , Hongqiu Wu
  • , Haiming Jin
  • , Luoyi Fu
  • Shanghai Jiao Tong University
  • Iowa State University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ (0,1), with probability at least 1-δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms: 1) a genie-aided algorithm GA; 2) the successive elimination algorithm with unknown gaps SEUG; 3) the successive elimination algorithm with unknown gaps and variance information SEUG-V, where gaps denote the differences between the optimal arm and the suboptimal arms. We show that for all three algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \frac{1}{\delta } increases. Moreover, compared to existing works, the running of our elimination-type algorithms is independent of the arm-related parameters, which is more practical. In addition, we also provide a fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions, and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of \log \frac{1}{\delta }. Finally, we validate our theoretical results through numerical experiments.

源语言英语
主期刊名INFOCOM 2020 - IEEE Conference on Computer Communications
出版商Institute of Electrical and Electronics Engineers Inc.
159-168
页数10
ISBN(电子版)9781728164120
DOI
出版状态已出版 - 7月 2020
已对外发布
活动38th IEEE Conference on Computer Communications, INFOCOM 2020 - Toronto, 加拿大
期限: 6 7月 20209 7月 2020

出版系列

姓名Proceedings - IEEE INFOCOM
2020-July
ISSN(印刷版)0743-166X

会议

会议38th IEEE Conference on Computer Communications, INFOCOM 2020
国家/地区加拿大
Toronto
时期6/07/209/07/20

指纹

探究 'Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits' 的科研主题。它们共同构成独一无二的指纹。

引用此