DOI QR코드

DOI QR Code

A Study on Application of Reinforcement Learning Algorithm Using Pixel Data

픽셀 데이터를 이용한 강화 학습 알고리즘 적용에 관한 연구

  • 문새마로 (숭실대학교 소프트웨어특성화대학원) ;
  • 최용락 (숭실대학교 소프트웨어특성화대학원)
  • Received : 2016.10.17
  • Accepted : 2016.10.28
  • Published : 2016.12.31

Abstract

Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data.

Keywords

References

  1. Bengio, Y., P. Lamblin, D. Popovici, and H. Larochelle, "Greedy Layer-wise Training of Deep Networks", Advances in Neural Information Processing Systems, Vol.19, 2007, 153.
  2. Fukushima, K., "Neocognitron : A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position", Biological Cybernetics, Vol.34, No.4, 1980, 193-202.
  3. Hsu, K., H.V. Gupta, and S. Sorooshian, "Artificial Neural Network Modeling of the Rainfall Runoff Process", Water Resources Research, Vol.31, No.10, 1995, 2517-2530. https://doi.org/10.1029/95WR01955
  4. Mnih, V., K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level Control Through Deep Reinforcement Learning", Nature, Vol. 518, No.7540, 2015, 529-533. https://doi.org/10.1038/nature14236
  5. Sutton, R.S. and A.G. Barto, Reinforcement Learning : An Introduction, MIT press, Cambridge, 1998.
  6. Sutton, R.S., D. McAllester, S. Singh, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation", NIPS, Vol.99, 1999, 1057-1063.
  7. Watkins, C.J.C.H. and P. Dayan, "Q-learning", Machine Learning, Vol.8, No.3, 1992, 279-292, doi : 10.1007/BF00992698(Downloaded April 2, 2016).