Browse > Article
http://dx.doi.org/10.22156/CS4SMB.2021.11.03.001

Design and Implementation of Reinforcement Learning Agent Using PPO Algorithim for Match 3 Gameplay  

Park, Dae-Geun (Department of Game Design, Kongju National University)
Lee, Wan-Bok (Department of Game Design, Kongju National University)
Publication Information
Journal of Convergence for Information Technology / v.11, no.3, 2021 , pp. 1-6 More about this Journal
Abstract
Most of the match-3 puzzle games supports automatic play using the MCTS algorithm. However, implementing reinforcement learning agents is not an easy job because it requires both the knowledge of machine learning and the way of complex interactions within the development environment. This study proposes a method in which we can easily design reinforcement learning agents and implement game play agents by applying PPO(Proximal Policy Optimization) algorithms. And we could identify the performance was increased about 44% than the conventional method. The tools we used are the Unity 3D game engine and Unity ML SDK. The experimental result shows that agents became to learn game rules and make better strategic decisions as experiments go on. On average, the puzzle gameplay agents implemented in this study played puzzle games better than normal people. It is expected that the designed agent could be used to speed up the game level design process.
Keywords
Game AI; Auto Play; Reinforcement Learning; Match 3 Puzzle; Unity ML;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E. Poromaa. (2017. Crushing Candy Crush : Predicting Human Success Rate in a Mobile Game using Monte-Carlo Tree Search. Student thesis. KTH.
2 R. Coulom. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. 5th International Conference on Computer and Games. May 29-31.
3 D. Silver. (2016). Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587). 484-489.   DOI
4 A. Andelkovic. (2018). Using Artificial Intelligence to Test the Candy Crush Saga Game. Alexander Andelkovic. comaqa. (Online). https://www.youtube.com/watch?v=4xECMpgeOxE/
5 D. Silver. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. Science. 362, 1140-1144.   DOI
6 L. Kocsis & C. Szepesvari. (2006). Bandit based monte-carlo planning. In European conference on machine learning (pp. 282-293). Springer, Berlin, Heidelberg.
7 S. Gelly, Y. Wang, R. Munos & O. Teytaud. (2006). Modification of UCT with Patterns in Monte-Carlo Go. Computer Science.
8 Aiandgames. (2018). Monte-Carlo Tree Search in TOTAL WAR: ROME II's Campaign AI. aiandgames. (Online). https://aiandgames.com/revolutionary-warfare-the-ai-of-total-war-part-3/
9 M. V. Otterlo & M. A. Wiering. (2012). Reinforcement learning and markov decision processes. In Reinforcement learning (pp. 3-42). Springer, Berlin, Heidelberg. DOI : 10.1007/978-3-642-27645-3_1
10 F. S. Melo, (2007). Convergence of Q-learning: a simple proof. Proceedings of the European Control Conference 2007. 2-5.
11 R. Bellman. (1957). A Markovian Decision Process. Journal of Mathematics and Mechanics. 6(5). 679-684.
12 M. Tokic & G. Palm. (2011). Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax. Advances in Artificial Intelligence, Lecture Notes in Computer Science. 7006. 335-346 DOI : 10.1007/978-3-642-24455-1_33
13 S. Purmonen, (2017). Predicting Game Level Difficulty Using Deep Neural Networks. Student thesis of KTH.