DOI QR코드

DOI QR Code

Design and Implementation of Reinforcement Learning Agent Using PPO Algorithim for Match 3 Gameplay

매치 3 게임 플레이를 위한 PPO 알고리즘을 이용한 강화학습 에이전트의 설계 및 구현

  • Park, Dae-Geun (Department of Game Design, Kongju National University) ;
  • Lee, Wan-Bok (Department of Game Design, Kongju National University)
  • 박대근 (공주대학교 게임디자인학과) ;
  • 이완복 (공주대학교 게임디자인학과)
  • Received : 2020.12.29
  • Accepted : 2021.03.20
  • Published : 2021.03.28

Abstract

Most of the match-3 puzzle games supports automatic play using the MCTS algorithm. However, implementing reinforcement learning agents is not an easy job because it requires both the knowledge of machine learning and the way of complex interactions within the development environment. This study proposes a method in which we can easily design reinforcement learning agents and implement game play agents by applying PPO(Proximal Policy Optimization) algorithms. And we could identify the performance was increased about 44% than the conventional method. The tools we used are the Unity 3D game engine and Unity ML SDK. The experimental result shows that agents became to learn game rules and make better strategic decisions as experiments go on. On average, the puzzle gameplay agents implemented in this study played puzzle games better than normal people. It is expected that the designed agent could be used to speed up the game level design process.

매치 3 퍼즐 게임들은 주로 MCTS(Monte Carlo Tree Search) 알고리즘을 사용하여 자동 플레이를 구현하였지만 MCTS의 느린 탐색 속도로 인해 MCTS와 DNN(Deep Neural Network)을 함께 적용하거나 강화학습으로 인공지능을 구현하는 것이 일반적인 경향이다. 본 연구에서는 매치 3 게임 개발에 주로 사용되는 유니티3D 엔진과 유니티 개발사에서 제공해주는 머신러닝 SDK를 이용하여 PPO(Proximal Policy Optimization) 알고리즘을 적용한 강화학습 에이전트를 설계 및 구현하여, 그 성능을 확인해본 결과, 44% 정도 성능이 향상되었음을 확인하였다. 실험 결과 에이전트가 게임 규칙을 배우고 실험이 진행됨에 따라 더 나은 전략적 결정을 도출 해 낼 수 있는 것을 확인할 수 있었으며 보통 사람들보다 퍼즐 게임을 더 잘 수행하는 결과를 확인하였다. 본 연구에서 설계 및 구현한 에이전트가 일반 사람들보다 더 잘 플레이하는 만큼, 기계와 인간 플레이 수준 사이의 간극을 조절하여 게임의 레벨 디지인에 적용된다면 향후 빠른 스테이지 개발에 도움이 될 것으로 기대된다.

Keywords

References

  1. E. Poromaa. (2017. Crushing Candy Crush : Predicting Human Success Rate in a Mobile Game using Monte-Carlo Tree Search. Student thesis. KTH.
  2. R. Coulom. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. 5th International Conference on Computer and Games. May 29-31.
  3. D. Silver. (2016). Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587). 484-489. https://doi.org/10.1038/nature16961
  4. A. Andelkovic. (2018). Using Artificial Intelligence to Test the Candy Crush Saga Game. Alexander Andelkovic. comaqa. (Online). https://www.youtube.com/watch?v=4xECMpgeOxE/
  5. D. Silver. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. Science. 362, 1140-1144. https://doi.org/10.1126/science.aar6404
  6. L. Kocsis & C. Szepesvari. (2006). Bandit based monte-carlo planning. In European conference on machine learning (pp. 282-293). Springer, Berlin, Heidelberg.
  7. S. Gelly, Y. Wang, R. Munos & O. Teytaud. (2006). Modification of UCT with Patterns in Monte-Carlo Go. Computer Science.
  8. Aiandgames. (2018). Monte-Carlo Tree Search in TOTAL WAR: ROME II's Campaign AI. aiandgames. (Online). https://aiandgames.com/revolutionary-warfare-the-ai-of-total-war-part-3/
  9. M. V. Otterlo & M. A. Wiering. (2012). Reinforcement learning and markov decision processes. In Reinforcement learning (pp. 3-42). Springer, Berlin, Heidelberg. DOI : 10.1007/978-3-642-27645-3_1
  10. F. S. Melo, (2007). Convergence of Q-learning: a simple proof. Proceedings of the European Control Conference 2007. 2-5.
  11. R. Bellman. (1957). A Markovian Decision Process. Journal of Mathematics and Mechanics. 6(5). 679-684.
  12. M. Tokic & G. Palm. (2011). Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax. Advances in Artificial Intelligence, Lecture Notes in Computer Science. 7006. 335-346 DOI : 10.1007/978-3-642-24455-1_33
  13. S. Purmonen, (2017). Predicting Game Level Difficulty Using Deep Neural Networks. Student thesis of KTH.