A non-parametric solution to the multi-armed bandit problem with covariates

Mingyao Ai, Yimin Huang, Jun Yu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

In recent years, the multi-armed bandit problem regains popularity especially for the case with covariates since it has new applications in customized services such as personalized medicine. To deal with the bandit problem with covariates, a policy called binned subsample mean comparison that decomposes the original problem into some proper classic bandit problems is introduced. The growth rate in a setting that the reward of each arm depends on observable covariates is studied accordingly. When rewards follow an exponential family, it can be shown that the regret of the proposed method can achieve the nearly optimal growth rate. Simulations show that the proposed policy has the competitive performance compared with other policies.

Original languageEnglish
Pages (from-to)402-413
Number of pages12
JournalJournal of Statistical Planning and Inference
Volume211
DOIs
Publication statusPublished - Mar 2021

Keywords

  • Efficient policy
  • Multi-armed bandit problem
  • Nonparametric solution
  • Subsample comparisons

Fingerprint

Dive into the research topics of 'A non-parametric solution to the multi-armed bandit problem with covariates'. Together they form a unique fingerprint.

Cite this