Search | Korea Science

Choi, Sanghee;Chang, Hyeong Soo
- Journal of KIISE
- /
- v.44 no.1
- /
- pp.63-70
- /
- 2017
This paper considers the problem of combining multiple strategies for solving sleeping bandit problems with stochastic rewards and stochastic availability. It also proposes an algorithm, called sleepComb(${\Phi}$), the idea of which is to select an appropriate strategy for each time step based on ${\epsilon}_t$-probabilistic switching. ${\epsilon}_t$-probabilistic switching is used in a well-known parameter-based heuristic ${\epsilon}_t$-greedy strategy. The algorithm also converges to the "best" strategy properly defined on the sleeping bandit problem. In the experimental results, it is shown that sleepComb(${\Phi}$) has convergence, and it converges to the "best" strategy rapidly compared to other combining algorithms. Also, we can see that it chooses the "best" strategy more frequently.
https://doi.org/10.5626/JOK.2017.44.1.63 인용 KSCI