• Title/Summary/Keyword: epsilon-Greedy

Search Result 4, Processing Time 0.02 seconds

Combining Multiple Strategies for Sleeping Bandits with Stochastic Rewards and Availability (확률적 보상과 유효성을 갖는 Sleeping Bandits의 다수의 전략을 융합하는 기법)

  • Choi, Sanghee;Chang, Hyeong Soo
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.63-70
    • /
    • 2017
  • This paper considers the problem of combining multiple strategies for solving sleeping bandit problems with stochastic rewards and stochastic availability. It also proposes an algorithm, called sleepComb(${\Phi}$), the idea of which is to select an appropriate strategy for each time step based on ${\epsilon}_t$-probabilistic switching. ${\epsilon}_t$-probabilistic switching is used in a well-known parameter-based heuristic ${\epsilon}_t$-greedy strategy. The algorithm also converges to the "best" strategy properly defined on the sleeping bandit problem. In the experimental results, it is shown that sleepComb(${\Phi}$) has convergence, and it converges to the "best" strategy rapidly compared to other combining algorithms. Also, we can see that it chooses the "best" strategy more frequently.

An unwanted facility location problem with negative influence cost and transportation cost (기피비용과 수송비용을 고려한 기피시설 입지문제)

  • Yang, Byoung-Hak
    • Journal of the Korea Safety Management & Science
    • /
    • v.15 no.1
    • /
    • pp.77-85
    • /
    • 2013
  • In the location science, environmental effect becomes a new main consideration for site selection. For the unwanted facility location selection, decision makers should consider the cost of resolving the environmental conflict. We introduced the negative influence cost for the facility which was inversely proportional to distance between the facility and residents. An unwanted facility location problem was suggested to minimize the sum of the negative influence cost and the transportation cost. The objective cost function was analyzed as nonlinear type and was neither convex nor concave. Three GRASP (Greedy Randomized adaptive Search Procedure) methods as like Random_GRASP, Epsilon_GRASP and GRID_GRASP were developed to solve the unwanted facility location problem. The Newton's method for nonlinear optimization problem was used for local search in GRASP. Experimental results showed that quality of solution of the GRID_GRASP was better than those of Random_GRASP and Epsilon_GRASP. The calculation time of Random_GRASP and Epsilon_GRASP were faster than that of Grid_GRASP.

The UCT algorithm applied to find the best first move in the game of Tic-Tac-Toe (삼목 게임에서 최상의 첫 수를 구하기 위해 적용된 신뢰상한트리 알고리즘)

  • Lee, Byung-Doo;Park, Dong-Soo;Choi, Young-Wook
    • Journal of Korea Game Society
    • /
    • v.15 no.5
    • /
    • pp.109-118
    • /
    • 2015
  • The game of Go originated from ancient China is regarded as one of the most difficult challenges in the filed of AI. Over the past few years, the top computer Go programs based on MCTS have surprisingly beaten professional players with handicap. MCTS is an approach that simulates a random sequence of legal moves until the game is ended, and replaced the traditional knowledge-based approach. We applied the UCT algorithm which is a MCTS variant to the game of Tic-Tac-Toe for finding the best first move, and compared it with the result generated by a pure MCTS. Furthermore, we introduced and compared the performances of epsilon-Greedy algorithm and UCB algorithm for solving the Multi-Armed Bandit problem to understand the UCB.

Developing Novel Algorithms to Reduce the Data Requirements of the Capture Matrix for a Wind Turbine Certification (풍력 발전기 평가를 위한 수집 행렬 데이터 절감 알고리즘 개발)

  • Lee, Jehyun;Choi, Jungchul
    • New & Renewable Energy
    • /
    • v.16 no.1
    • /
    • pp.15-24
    • /
    • 2020
  • For mechanical load testing of wind turbines, capture matrix is constructed for various range of wind speeds according to the international standard IEC 61400-13. The conventional method wastes considerable amount of data by its invalid data policy -segment data into 10 minutes then remove invalid ones. Previously, we have suggested an alternative way to save the total amount of data to build a capture matrix, but the efficient selection of data has been still under question. The paper introduces optimization algorithms to construct capture matrix with less data. Heuristic algorithm (simple stacking and lowest frequency first), population method (particle swarm optimization) and Q-Learning accompanied with epsilon-greedy exploration are compared. All algorithms show better performance than the conventional way, where the distribution of enhancement was quite diverse. Among the algorithms, the best performance was achieved by heuristic method (lowest frequency first), and similarly by particle swarm optimization: Approximately 28% of data reduction in average and more than 40% in maximum. On the other hand, unexpectedly, the worst performance was achieved by Q-Learning, which was a promising candidate at the beginning. This study is helpful for not only wind turbine evaluation particularly the viewpoint of cost, but also understanding nature of wind speed data.