Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits

Zhida Qin; Xiaoying Gan; Jia Liu; Hongqiu Wu; Haiming Jin; Luoyi Fu

doi:10.1109/INFOCOM41043.2020.9155362

Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits

Zhida Qin, Xiaoying Gan, Jia Liu, Hongqiu Wu, Haiming Jin, Luoyi Fu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

7 Citations (Scopus)

Abstract

The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ (0,1), with probability at least 1-δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms: 1) a genie-aided algorithm GA; 2) the successive elimination algorithm with unknown gaps SEUG; 3) the successive elimination algorithm with unknown gaps and variance information SEUG-V, where gaps denote the differences between the optimal arm and the suboptimal arms. We show that for all three algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \frac{1}{\delta } increases. Moreover, compared to existing works, the running of our elimination-type algorithms is independent of the arm-related parameters, which is more practical. In addition, we also provide a fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions, and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of \log \frac{1}{\delta }. Finally, we validate our theoretical results through numerical experiments.

Original language	English
Title of host publication	INFOCOM 2020 - IEEE Conference on Computer Communications
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	159-168
Number of pages	10
ISBN (Electronic)	9781728164120
DOIs	https://doi.org/10.1109/INFOCOM41043.2020.9155362
Publication status	Published - Jul 2020
Externally published	Yes
Event	38th IEEE Conference on Computer Communications, INFOCOM 2020 - Toronto, Canada Duration: 6 Jul 2020 → 9 Jul 2020

Publication series

Name	Proceedings - IEEE INFOCOM
Volume	2020-July
ISSN (Print)	0743-166X

Conference

Conference	38th IEEE Conference on Computer Communications, INFOCOM 2020
Country/Territory	Canada
City	Toronto
Period	6/07/20 → 9/07/20

Access to Document

10.1109/INFOCOM41043.2020.9155362

Cite this

Qin, Z., Gan, X., Liu, J., Wu, H., Jin, H., & Fu, L. (2020). Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits. In INFOCOM 2020 - IEEE Conference on Computer Communications (pp. 159-168). Article 9155362 (Proceedings - IEEE INFOCOM; Vol. 2020-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM41043.2020.9155362

@inproceedings{de8fe5e2465b4686b73b3838d7c51e32,

title = "Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits",

abstract = "The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ (0,1), with probability at least 1-δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms: 1) a genie-aided algorithm GA; 2) the successive elimination algorithm with unknown gaps SEUG; 3) the successive elimination algorithm with unknown gaps and variance information SEUG-V, where gaps denote the differences between the optimal arm and the suboptimal arms. We show that for all three algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \frac{1}{\delta } increases. Moreover, compared to existing works, the running of our elimination-type algorithms is independent of the arm-related parameters, which is more practical. In addition, we also provide a fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions, and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of \log \frac{1}{\delta }. Finally, we validate our theoretical results through numerical experiments.",

author = "Zhida Qin and Xiaoying Gan and Jia Liu and Hongqiu Wu and Haiming Jin and Luoyi Fu",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 38th IEEE Conference on Computer Communications, INFOCOM 2020 ; Conference date: 06-07-2020 Through 09-07-2020",

year = "2020",

month = jul,

doi = "10.1109/INFOCOM41043.2020.9155362",

language = "English",

series = "Proceedings - IEEE INFOCOM",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "159--168",

booktitle = "INFOCOM 2020 - IEEE Conference on Computer Communications",

address = "United States",

}

Qin, Z, Gan, X, Liu, J, Wu, H, Jin, H & Fu, L 2020, Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits. in INFOCOM 2020 - IEEE Conference on Computer Communications., 9155362, Proceedings - IEEE INFOCOM, vol. 2020-July, Institute of Electrical and Electronics Engineers Inc., pp. 159-168, 38th IEEE Conference on Computer Communications, INFOCOM 2020, Toronto, Canada, 6/07/20. https://doi.org/10.1109/INFOCOM41043.2020.9155362

Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits. / Qin, Zhida; Gan, Xiaoying; Liu, Jia et al.
INFOCOM 2020 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2020. p. 159-168 9155362 (Proceedings - IEEE INFOCOM; Vol. 2020-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits

AU - Qin, Zhida

AU - Gan, Xiaoying

AU - Liu, Jia

AU - Wu, Hongqiu

AU - Jin, Haiming

AU - Fu, Luoyi

PY - 2020/7

Y1 - 2020/7

N2 - The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ (0,1), with probability at least 1-δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms: 1) a genie-aided algorithm GA; 2) the successive elimination algorithm with unknown gaps SEUG; 3) the successive elimination algorithm with unknown gaps and variance information SEUG-V, where gaps denote the differences between the optimal arm and the suboptimal arms. We show that for all three algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \frac{1}{\delta } increases. Moreover, compared to existing works, the running of our elimination-type algorithms is independent of the arm-related parameters, which is more practical. In addition, we also provide a fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions, and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of \log \frac{1}{\delta }. Finally, we validate our theoretical results through numerical experiments.

AB - The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any δ (0,1), with probability at least 1-δ, the player aims to find the optimal arm with the largest ratio of expected reward to expected cost using as few samplings as possible. To solve this problem, we propose three algorithms: 1) a genie-aided algorithm GA; 2) the successive elimination algorithm with unknown gaps SEUG; 3) the successive elimination algorithm with unknown gaps and variance information SEUG-V, where gaps denote the differences between the optimal arm and the suboptimal arms. We show that for all three algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \frac{1}{\delta } increases. Moreover, compared to existing works, the running of our elimination-type algorithms is independent of the arm-related parameters, which is more practical. In addition, we also provide a fundamental lower bound for sample complexities of any algorithms under Bernoulli distributions, and show that the sample complexities of the proposed three algorithms match that of the lower bound in the sense of \log \frac{1}{\delta }. Finally, we validate our theoretical results through numerical experiments.

UR - http://www.scopus.com/inward/record.url?scp=85090284500&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM41043.2020.9155362

DO - 10.1109/INFOCOM41043.2020.9155362

M3 - Conference contribution

AN - SCOPUS:85090284500

T3 - Proceedings - IEEE INFOCOM

SP - 159

EP - 168

BT - INFOCOM 2020 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 38th IEEE Conference on Computer Communications, INFOCOM 2020

Y2 - 6 July 2020 through 9 July 2020

ER -

Exploring Best Arm with Top Reward-Cost Ratio in Stochastic Bandits

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this