Browse > Article

(The Development of Janggi Board Game Using Backpropagation Neural Network and Q Learning Algorithm)  

황상문 (전주공업대학 전자정보과)
박인규 (중부대학교 정보공학부 전자계산학과)
백덕수 (익산대학 전자정보과)
진달복 (원광대학교 전기전자공학부 전자공학과)
Publication Information
Abstract
This paper proposed the strategy learning method by means of the fusion of Back-Propagation neural network and Q learning algorithm for two-person, deterministic janggi board game. The learning process is accomplished simply through the playing each other. The system consists of two parts of move generator and search kernel. The one consists of move generator generating the moves on the board, the other consists of back-propagation and Q learning plus $\alpha$$\beta$ search algorithm in an attempt to learn the evaluation function. while temporal difference learns the discrepancy between the adjacent rewards, Q learning acquires the optimal policies even when there is no prior knowledge of effects of its moves on the environment through the learning of the evaluation function for the augmented rewards. Depended on the evaluation function through lots of games through the learning procedure it proved that the percentage won is linearly proportional to the portion of learning in general.
Keywords
Citations & Related Records
연도 인용수 순위
  • Reference
1 Boyan, J. A. (1992). Modular neural networks for learning. Master's thesis, University of Cambridge. Available via FTP from archive. ohiostate.edu:/pub/neuroprose
2 Minsky, M. and papert, S. (1969). Perceptrons. MIT Press, Cambirdge. Shannon, C. E. (1950). Programming a computer for playing chess. Philosophy Magazine, 41,256-275
3 Tom M. Mitchell, (1997). Machine learning, The McGraw-Hill Companies, Inc. pp. 367-387
4 McKinsey, J. C. (1952). Introduction to the theory of games. The RAND Series. McGraw-Hill Book Company, Inc
5 Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational. abilities. In Proceedings of the National Academy of Sciences USA, volume 79, pp 2554-2558
6 Sutton, R. S. (1984). Temporal credit assignment in reinforcement learning. PhD Thesis, University of Massachusetts, Amherst
7 Hecht-Nielsen, R.(1989). Neurocomputing. Addison-Wesley Publishing Company, Inc. Holland, J. H. (1983). Escaping brittleness. In Proceedings of the International Machine Learning Workshop, pp 92-95
8 Lee, K.-F. and Mahajan, S.(1988). A pattern classification approach to evaluation function learning. Artifical Intelligence, 36,1-25