TY - JOUR
T1 - Cost-aware Best Arm Identification in Stochastic Bandits
AU - Qin, Zhida
AU - Xue, Wenhao
AU - Zheng, Lu
AU - Gan, Xiaoying
AU - Wu, Hongqiu
AU - Jin, Haiming
AU - Fu, Luoyi
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/4/16
Y1 - 2025/4/16
N2 - The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any , with probability at least , the player aims to find the arm with the largest ratio of expected reward to expected cost using as few samplings as possible. Specifically, we consider two settings: (1) the precise setting, i.e., identifying the precise optimal one; (2) the Probably Approximate Correct (PAC) setting, which identifies the -optimal one. For the precise setting, we design the elimination-type algorithms and provide a fundamental lower bound which asymptotically matches the upper bound, while in the PAC setting, an UCB-type algorithm which amed -RCB algorithm is proposed. We show that for all algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as increases. Moreover, compared to existing works, the running of our algorithms is independent of the arm-related parameters, which is more practical. Finally, we validate our theoretical results through numerical experiments.
AB - The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any , with probability at least , the player aims to find the arm with the largest ratio of expected reward to expected cost using as few samplings as possible. Specifically, we consider two settings: (1) the precise setting, i.e., identifying the precise optimal one; (2) the Probably Approximate Correct (PAC) setting, which identifies the -optimal one. For the precise setting, we design the elimination-type algorithms and provide a fundamental lower bound which asymptotically matches the upper bound, while in the PAC setting, an UCB-type algorithm which amed -RCB algorithm is proposed. We show that for all algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as increases. Moreover, compared to existing works, the running of our algorithms is independent of the arm-related parameters, which is more practical. Finally, we validate our theoretical results through numerical experiments.
KW - Best arm identification-optimal
KW - Lower bound
KW - Sample complexities
UR - http://www.scopus.com/inward/record.url?scp=105003442486&partnerID=8YFLogxK
U2 - 10.1145/3712290
DO - 10.1145/3712290
M3 - Article
AN - SCOPUS:105003442486
SN - 2157-6904
VL - 16
JO - ACM Transactions on Intelligent Systems and Technology
JF - ACM Transactions on Intelligent Systems and Technology
IS - 2
M1 - 42
ER -