TY - GEN
T1 - Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits
AU - Qin, Zhida
AU - Gan, Xiaoying
AU - Liu, Jia
AU - Wu, Hongqiu
AU - Jin, Haiming
AU - Fu, Luoyi
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ (0,1), with probability at least 1-δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms: 1) a genie-aided algorithm GA; 2) the successive elimination algorithm with unknown gaps SEUG; 3) the successive elimination algorithm with unknown gaps and variance information SEUG-V, where gaps denote the differences between the optimal arm and the suboptimal arms. We show that for all three algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \frac{1}{\delta } increases. Moreover, compared to existing works, the running of our elimination-type algorithms is independent of the arm-related parameters, which is more practical. In addition, we also provide a fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions, and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of \log \frac{1}{\delta }. Finally, we validate our theoretical results through numerical experiments.
AB - The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ (0,1), with probability at least 1-δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms: 1) a genie-aided algorithm GA; 2) the successive elimination algorithm with unknown gaps SEUG; 3) the successive elimination algorithm with unknown gaps and variance information SEUG-V, where gaps denote the differences between the optimal arm and the suboptimal arms. We show that for all three algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \frac{1}{\delta } increases. Moreover, compared to existing works, the running of our elimination-type algorithms is independent of the arm-related parameters, which is more practical. In addition, we also provide a fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions, and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of \log \frac{1}{\delta }. Finally, we validate our theoretical results through numerical experiments.
UR - http://www.scopus.com/inward/record.url?scp=85090284500&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM41043.2020.9155362
DO - 10.1109/INFOCOM41043.2020.9155362
M3 - Conference contribution
AN - SCOPUS:85090284500
T3 - Proceedings - IEEE INFOCOM
SP - 159
EP - 168
BT - INFOCOM 2020 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 38th IEEE Conference on Computer Communications, INFOCOM 2020
Y2 - 6 July 2020 through 9 July 2020
ER -