Design and Implementation of Reinforcement Learning Agent Using PPO Algorithim for Match 3 Gameplay

Park, Dae-Geun;Lee, Wan-Bok;

doi:10.22156/CS4SMB.2021.11.03.001

Journal of Convergence for Information Technology (융합정보논문지)

Volume 11 Issue 3
/
Pages.1-6
/
2021
/
2586-4440(eISSN)

Convergence Society for SMB (중소기업융합학회)

DOI QR Code

Design and Implementation of Reinforcement Learning Agent Using PPO Algorithim for Match 3 Gameplay

매치 3 게임 플레이를 위한 PPO 알고리즘을 이용한 강화학습 에이전트의 설계 및 구현

Park, Dae-Geun (Department of Game Design, Kongju National University) ;
Lee, Wan-Bok (Department of Game Design, Kongju National University)

박대근 (공주대학교 게임디자인학과) ;
이완복 (공주대학교 게임디자인학과)

Received : 2020.12.29
Accepted : 2021.03.20
Published : 2021.03.28

https://doi.org/10.22156/CS4SMB.2021.11.03.001 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Most of the match-3 puzzle games supports automatic play using the MCTS algorithm. However, implementing reinforcement learning agents is not an easy job because it requires both the knowledge of machine learning and the way of complex interactions within the development environment. This study proposes a method in which we can easily design reinforcement learning agents and implement game play agents by applying PPO(Proximal Policy Optimization) algorithms. And we could identify the performance was increased about 44% than the conventional method. The tools we used are the Unity 3D game engine and Unity ML SDK. The experimental result shows that agents became to learn game rules and make better strategic decisions as experiments go on. On average, the puzzle gameplay agents implemented in this study played puzzle games better than normal people. It is expected that the designed agent could be used to speed up the game level design process.

매치 3 퍼즐 게임들은 주로 MCTS(Monte Carlo Tree Search) 알고리즘을 사용하여 자동 플레이를 구현하였지만 MCTS의 느린 탐색 속도로 인해 MCTS와 DNN(Deep Neural Network)을 함께 적용하거나 강화학습으로 인공지능을 구현하는 것이 일반적인 경향이다. 본 연구에서는 매치 3 게임 개발에 주로 사용되는 유니티3D 엔진과 유니티 개발사에서 제공해주는 머신러닝 SDK를 이용하여 PPO(Proximal Policy Optimization) 알고리즘을 적용한 강화학습 에이전트를 설계 및 구현하여, 그 성능을 확인해본 결과, 44% 정도 성능이 향상되었음을 확인하였다. 실험 결과 에이전트가 게임 규칙을 배우고 실험이 진행됨에 따라 더 나은 전략적 결정을 도출 해 낼 수 있는 것을 확인할 수 있었으며 보통 사람들보다 퍼즐 게임을 더 잘 수행하는 결과를 확인하였다. 본 연구에서 설계 및 구현한 에이전트가 일반 사람들보다 더 잘 플레이하는 만큼, 기계와 인간 플레이 수준 사이의 간극을 조절하여 게임의 레벨 디지인에 적용된다면 향후 빠른 스테이지 개발에 도움이 될 것으로 기대된다.

Keywords

References

E. Poromaa. (2017. Crushing Candy Crush : Predicting Human Success Rate in a Mobile Game using Monte-Carlo Tree Search. Student thesis. KTH.
R. Coulom. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. 5th International Conference on Computer and Games. May 29-31.
D. Silver. (2016). Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587). 484-489. https://doi.org/10.1038/nature16961
A. Andelkovic. (2018). Using Artificial Intelligence to Test the Candy Crush Saga Game. Alexander Andelkovic. comaqa. (Online). https://www.youtube.com/watch?v=4xECMpgeOxE/
D. Silver. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. Science. 362, 1140-1144. https://doi.org/10.1126/science.aar6404
L. Kocsis & C. Szepesvari. (2006). Bandit based monte-carlo planning. In European conference on machine learning (pp. 282-293). Springer, Berlin, Heidelberg.
S. Gelly, Y. Wang, R. Munos & O. Teytaud. (2006). Modification of UCT with Patterns in Monte-Carlo Go. Computer Science.
Aiandgames. (2018). Monte-Carlo Tree Search in TOTAL WAR: ROME II's Campaign AI. aiandgames. (Online). https://aiandgames.com/revolutionary-warfare-the-ai-of-total-war-part-3/
M. V. Otterlo & M. A. Wiering. (2012). Reinforcement learning and markov decision processes. In Reinforcement learning (pp. 3-42). Springer, Berlin, Heidelberg. DOI : 10.1007/978-3-642-27645-3_1
F. S. Melo, (2007). Convergence of Q-learning: a simple proof. Proceedings of the European Control Conference 2007. 2-5.
R. Bellman. (1957). A Markovian Decision Process. Journal of Mathematics and Mechanics. 6(5). 679-684.
M. Tokic & G. Palm. (2011). Value-Difference Based Exploration: Adaptive Control Between Epsilon-Greedy and Softmax. Advances in Artificial Intelligence, Lecture Notes in Computer Science. 7006. 335-346 DOI : 10.1007/978-3-642-24455-1_33
S. Purmonen, (2017). Predicting Game Level Difficulty Using Deep Neural Networks. Student thesis of KTH.

Journal of Convergence for Information Technology (융합정보논문지)

Design and Implementation of Reinforcement Learning Agent Using PPO Algorithim for Match 3 Gameplay

매치 3 게임 플레이를 위한 PPO 알고리즘을 이용한 강화학습 에이전트의 설계 및 구현

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)