Abstract
We consider the multichannel access problem in which each of N channels is modeled as a multistate Markov chain. At each time instant, a transmitter accesses M channels and obtains some reward depending on the states of those chosen channels. The considered problem can be cast into a restless multiarmed bandit (RMAB) problem. It is well-known that solving the RMAB problem is PSPACE-hard. A natural alternative is to consider the myopic policy that maximizes the immediate reward but ignores the impact of the current strategy on the future reward. In this letter, we perform an analytical study on structure, optimality, and performance of the myopic policy for the considered RMAB problem.We show that the myopic policy has a simple robust structure that reduces channel selection to a roundrobin procedure. The optimality of this simple policy is established for accessing M = N - 1 of N channels and conjectured for the general case of arbitrary M based on the structure of myopic policy.
| Original language | English |
|---|---|
| Article number | 2503770 |
| Pages (from-to) | 300-303 |
| Number of pages | 4 |
| Journal | IEEE Communications Letters |
| Volume | 20 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 1 Feb 2016 |
| Externally published | Yes |
Keywords
- Myopic policy
- Optimality
- PSPACE-Hard
- RMAB