• Title/Summary/Keyword: Sleeping bandit problem

Search Result 1, Processing Time 0.014 seconds

Combining Multiple Strategies for Sleeping Bandits with Stochastic Rewards and Availability (확률적 보상과 유효성을 갖는 Sleeping Bandits의 다수의 전략을 융합하는 기법)

  • Choi, Sanghee;Chang, Hyeong Soo
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.63-70
    • /
    • 2017
  • This paper considers the problem of combining multiple strategies for solving sleeping bandit problems with stochastic rewards and stochastic availability. It also proposes an algorithm, called sleepComb(${\Phi}$), the idea of which is to select an appropriate strategy for each time step based on ${\epsilon}_t$-probabilistic switching. ${\epsilon}_t$-probabilistic switching is used in a well-known parameter-based heuristic ${\epsilon}_t$-greedy strategy. The algorithm also converges to the "best" strategy properly defined on the sleeping bandit problem. In the experimental results, it is shown that sleepComb(${\Phi}$) has convergence, and it converges to the "best" strategy rapidly compared to other combining algorithms. Also, we can see that it chooses the "best" strategy more frequently.