DOI QR코드

DOI QR Code

A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning

  • Sung, Yun-Sick (Dept. of Game Engineering, Graduate School, Dongguk University-Seoul) ;
  • Cho, Kyun-Geun (Dept. of Multimedia Engineering, Dongguk University-Seoul)
  • Received : 2011.10.06
  • Accepted : 2012.02.09
  • Published : 2012.09.30

Abstract

The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate Macro-Actions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the Macro-Actions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies.

Keywords

References

  1. R. Tobi, Hierarchical Reinforcement Learning on the Virtual Battlefield. University of Amsterdam; 2007.
  2. F. S. Melo, and M. I. Ribeiro, "Q-learning with Linear Function Approximation," Proceedings of the 20th Annual Conference on Learning Theory, Sandiego, CA, June, 2007, Lecture Notes in Artificial Intelligence (LNAI), Vol.4539, 2007, pp.308-322.
  3. N. Ono, and K. Fukumoto, "Multi-agent Reinforcement Learning: A Modular Approach," Proceedings of the Second International Conference on Multiagent Systems, Kyoto, December, 1996, pp.252-258.
  4. A. Mcgovern, R. S. Sutton, and A. H. Fagg, "Roles of Macro-Actions in Accelerating Reinforcement Learning," Proceedings of Grace Hopper Celebration of Women in Computing, San Jose, CA, 1997, pp.13-18.
  5. T. Tateyama, S. Kawata, and T. Oguchi, "Automatic Generation of Macro-actions Using Genetic Algorithm for Reinforcement Learning," Proceedings of the 41st SICE Annual Conference, Osaka, August, 2002, Vol.1, pp.286-289.
  6. J. Randløv, Learning Macro-actions in Reinforcement Learning. University of Copenhagen; 1999.
  7. A. Cypher, Watch What I Do: Programming by Demonstration, MIT Press, 1993.
  8. N. Koenig, and M. J Matarić, "Behavior-based segmentation of demonstrated task," Proceeding of IEEE 10th International Conference on Development and Learning, Bloomington, May, 2006.
  9. Y. Sung, and K. Cho, "An Actions Generation Method of Virtual Character using Programming by Demonstration," Journal of Korea Game Society, Vol.11, No.2, 2011, pp.141-149.
  10. Y. Sung, K. Cho, and K. Um, "A Performance Improvement Technique for Nash Q-learning using Macro-Actions," Journal of Korea Multimedia Society, Vol.11, No.3, 2008, 353-363.
  11. R. Schoknecht, and M. Riedmiller, "Speeding-up Reinforcement Learning with Multi-step Actions," Proceedings of the Twelfth International Conference on Artificial Neural Networks, Aug, 2002, Lecture Notes in Computer Science (LNCS), Vol.2415, 2002, pp.813-818.
  12. M. J Mataric, Imitation in Animals and Artifacts : Sensory-Motor Primitives as a Basis for Imitation: Linking Perception to Action and Biology to Robotics, MIT Press, 2000, pp.391-422.
  13. M. N. Nicolescu, and M. J Mataric, "Natural Methods for Robot Task Learning: Instructive Demonstrations, Generalization and Practice," Proceedings of the Second International Joint Conference on Autonomous Agents and Multi-Agent Systems, Melbourne, July, 2003, pp.241-248.

Cited by

  1. Self-Organized Cognitive Sensor Networks: Distributed Channel Assignment for Pervasive Sensing vol.10, pp.3, 2014, https://doi.org/10.1155/2014/183090
  2. Human behaviors modeling in multi-agent virtual environment vol.76, pp.4, 2017, https://doi.org/10.1007/s11042-015-2547-z
  3. Human-Robot Interaction Learning Using Demonstration-Based Learning and Q-Learning in a Pervasive Sensing Environment vol.9, pp.11, 2013, https://doi.org/10.1155/2013/782043