DOI QR코드

DOI QR Code

강화학습 에이전트 시야 정보 차이에 의한 학습 성능 비교

Comparison of Learning Performance by Reinforcement Learning Agent Visibility Information Difference

  • 김찬섭 (홍익대학교 게임학부) ;
  • 장시환 (한국전자통신연구원 콘텐츠연구본부) ;
  • 양성일 (한국전자통신연구원 콘텐츠연구본부) ;
  • 강신진 (홍익대학교 게임학부)
  • 투고 : 2021.07.20
  • 심사 : 2021.09.27
  • 발행 : 2021.10.20

초록

인공지능 스스로가 자신을 발전시켜 최적의 문제 해결 방법을 찾는 강화학습은 여러 분야에서 활용 가치가 높은 기술이다. 특히 게임 분야는 강화학습 인공지능에 문제 해결을 위한 가상환경을 제공할 수 있다는 장점이 있으며 강화학습 에이전트는 주어진 환경에 대한 정보인 관측변수를 사용하여 자신의 상황과 환경에 대한 정보를 파악하여 환경에 대한 문제를 해결한다. 본 실험에서는 롤플레잉 게임의 인스턴트 던전 환경을 간략화하여 제작하고 에이전트에게 관측변수 중 시야에 관련된 관측변수를 다양하게 설정하였다. 실험 결과 각 설정된 변수들이 학습속도에 얼마나 영향을 주는지를 파악할 수 있었고, 이러한 결과는 롤플레잉 게임 강화학습 연구에 참고할 수 있다.

Reinforcement learning, in which artificial intelligence develops itself to find the best solution to problems, is a technology that is highly valuable in many fields. In particular, the game field has the advantage of providing a virtual environment for problem-solving to reinforcement learning artificial intelligence, and reinforcement learning agents solve problems about their environment by identifying information about their situation and environment using observations. In this experiment, the instant dungeon environment of the RPG game was simplified and produced and various observation variables related to the field of view were set to the agent. As a result of the experiment, it was possible to figure out how much each set variable affects the learning speed, and these results can be referred to in the study of game RPG reinforcement learning.

키워드

과제정보

This research is supported by the Ministry of Culture, Sports and Tourism and Korea Creative Content Agency (Project Number: R2019020067). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2019R1A2C1002525).

참고문헌

  1. Marzian, F., & Qamal, M. (2017). Game RPG "The Royal Sword" Berbasis Desktop Dengan Menggunakan Metode Finite State Machine (FSM). Jurnal Sistem Informasi, 1(2).
  2. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of go without human knowledge. nature, 550(7676), 354-359. https://doi.org/10.1038/nature24270
  3. Teahoon Kim, "Implementing Cookie Run AI that is better than me with deep learning and reinforcement learning", slideshare, last modified Oct 25, 2016, accessed May 24, 2021, https://www.slideshare.net/carpedm20/ai-67616630.
  4. Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., ... & Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350-354. https://doi.org/10.1038/s41586-019-1724-z
  5. Sangbin Moon, "Generation of progamer level Bimu AI using reinforcement learning in Blade and Soul", NDC, last modified Jul 24, 2019, accessed May 26, 2021,
  6. Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009, June). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41-48).
  7. Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (pp. 1057-1063).
  8. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  9. Soo Yeong Jang, et al. Deep reinforcement learning technology trends, ETRI Electronics and Telecommunications Trends, 34.4 (2019):1-14.
  10. Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015, June). Trust region policy optimization. In International conference on machine learning (pp. 1889-1897). PMLR.
  11. Pytorch Library, https://pytorch.org/
  12. Stable Baselines 3, https://github.com/DLR-RM/stable-baselines3
  13. Unity Engine, https://www.unity.com/
  14. ZeroMQ library, https://zeromq.org/
  15. Tensorboard, https://www.tensorflow.org/tensorboard