Search | Korea Science

Min Kyong Kim;Beom Seuk Hwang
- The Korean Journal of Applied Statistics
- /
- v.37 no.5
- /
- pp.663-673
- /
- 2024
The multi-armed bandits (MAB) problem, involves selecting actions to maximize rewards within dynamic environments. This study explores the application of Thompson sampling, a robust MAB algorithm, within the context of big data analytics and statistical learning theory. By leveraging large-scale banner click data from recommendation systems, we evaluate Thompson sampling's performance across various simulated scenarios, employing advanced approximation techniques. Our findings demonstrate that Thompson sampling, particularly with Langevin Monte Carlo approximation, maintains robust performance and scalability in big data environments. This underscores its practical significance and adaptability, aligning with contemporary challenges in statistical learning.
https://doi.org/10.5351/KJAS.2024.37.5.663 인용 PDF

Choi, Sanghee;Chang, Hyeong Soo
- Journal of KIISE
- /
- v.44 no.1
- /
- pp.63-70
- /
- 2017
This paper considers the problem of combining multiple strategies for solving sleeping bandit problems with stochastic rewards and stochastic availability. It also proposes an algorithm, called sleepComb(${\Phi}$), the idea of which is to select an appropriate strategy for each time step based on ${\epsilon}_t$-probabilistic switching. ${\epsilon}_t$-probabilistic switching is used in a well-known parameter-based heuristic ${\epsilon}_t$-greedy strategy. The algorithm also converges to the "best" strategy properly defined on the sleeping bandit problem. In the experimental results, it is shown that sleepComb(${\Phi}$) has convergence, and it converges to the "best" strategy rapidly compared to other combining algorithms. Also, we can see that it chooses the "best" strategy more frequently.
https://doi.org/10.5626/JOK.2017.44.1.63 인용 KSCI