Browse > Article
http://dx.doi.org/10.9716/KITS.2016.15.4.085

A Study on Application of Reinforcement Learning Algorithm Using Pixel Data  

Moon, Saemaro (숭실대학교 소프트웨어특성화대학원)
Choi, Yonglak (숭실대학교 소프트웨어특성화대학원)
Publication Information
Journal of Information Technology Services / v.15, no.4, 2016 , pp. 85-95 More about this Journal
Abstract
Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data.
Keywords
Artificial Intelligence; Reinforcement Learning; CNN(Convolutional Neural Networks); DQN(Deep Q-Networks); PG(Policy Gradient);
Citations & Related Records
연도 인용수 순위
  • Reference
1 Bengio, Y., P. Lamblin, D. Popovici, and H. Larochelle, "Greedy Layer-wise Training of Deep Networks", Advances in Neural Information Processing Systems, Vol.19, 2007, 153.
2 Fukushima, K., "Neocognitron : A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position", Biological Cybernetics, Vol.34, No.4, 1980, 193-202.
3 Hsu, K., H.V. Gupta, and S. Sorooshian, "Artificial Neural Network Modeling of the Rainfall Runoff Process", Water Resources Research, Vol.31, No.10, 1995, 2517-2530.   DOI
4 Mnih, V., K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level Control Through Deep Reinforcement Learning", Nature, Vol. 518, No.7540, 2015, 529-533.   DOI
5 Sutton, R.S. and A.G. Barto, Reinforcement Learning : An Introduction, MIT press, Cambridge, 1998.
6 Sutton, R.S., D. McAllester, S. Singh, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation", NIPS, Vol.99, 1999, 1057-1063.
7 Watkins, C.J.C.H. and P. Dayan, "Q-learning", Machine Learning, Vol.8, No.3, 1992, 279-292, doi : 10.1007/BF00992698(Downloaded April 2, 2016).   DOI