DOI QR코드

DOI QR Code

Area-Based Q-learning Algorithm to Search Target Object of Multiple Robots

다수 로봇의 목표물 탐색을 위한 Area-Based Q-learning 알고리즘

  • 윤한얼 (중앙대학교 전자전기공학부) ;
  • 심귀보 (중앙대학교 전자전기공학부)
  • Published : 2005.08.01

Abstract

In this paper, we present the area-based Q-learning to search a target object using multiple robot. To search the target in Markovian space, the robots should recognize their surrounding at where they are located and generate some rules to act upon by themselves. Under area-based Q-learning, a robot, first of all, obtains 6-distances from itself to environment by infrared sensor which are hexagonally allocated around itself. Second, it calculates 6-areas with those distances then take an action, i.e., turn and move toward where the widest space will be guaranteed. After the action is taken, the value of Q will be updated by relative formula at the state. We set up an experimental environment with five small mobile robots, obstacles, and a target object, and tried to search for a target object while navigating in a unknown hallway where some obstacles were placed. In the end of this paper, we presents the results of three algorithms - a random search, area-based action making (ABAM), and hexagonal area-based Q-teaming.

본 논문에서는 다수 로봇의 목표물 탐색을 위한 area-based Q-learning 알고리즘에 대해 논한다. 선험적 정보가 없는 공간내의 목표물을 탐색하기 위해, 로봇들은 주위의 환경을 인식하고 스스로 다음 행동에 대한 결정을 내릴 수 있어야 한다. Area-based Q-learning에서, 먼저 각 로봇은 정육각형을 이루도록 배치된 6개의 적외선 센서를 통해 자신과 주변 환경 사이의 거리를 구한다. 다음으로 이 거리데이터들로부터 6방향의 면적(area)을 계산하여, 보다 넓은 행동반경을 보장해주는 영역으로 이동(action)한다. 이동 후 다시 6방향의 면적을 계산, 이전 상태에서의 이동에 대한 Q-value를 업데이트 한다. 본 논문의 실험에서는 5대의 로봇을 이용하여 선험적 지식이 없고, 장애물이 놓여 있는 공간에서의 목표물 탐색을 시도하였다. 결론에서는 3개의 제어 알고리즘-랜덤 탐색, area-based action making (ABAM), hexagonal area-based Q-learning - 을 이용하여 목표물 탐색을 시도한 결과를 보인다.

Keywords

References

  1. L. Parker, 'Adaptive action selection for cooperative agent teams,' Proc. of 2nd Int. Conf. on Simulation of Adaptive Behavior, pp. 442-450, 1992
  2. G. Ogasawara, T. Omata, and T. Sato, 'Multiple movers using distributed, decision-theoretic control,' Proc. of Japan-USA Symp. on Flexible Automation, vol. 1, pp. 623-630, 1992
  3. D. Ballard, An Introduction to Natural Computation, The MIT Press Cambtidge, 1997
  4. J. Jang, C. Sun, and E. Mizutani, Neuro-Fuzzy Soft Computing, Prentice-Hall New Jersey, 1997
  5. W. Ashley, T. Balch, 'Value-based observation with robot teams (VBORT) using probabilistic techniques,' Proc. of Int. Conf. on Advanced Robotics, 2003
  6. W. Ashley, T. Balch, 'Value-based observation with robot teams (VBORT) for dynamic targets,' Proc. of Int. Conf. on Intelligent Robots and Systems, 2003
  7. D. Patterson and J. Hennessy, Computer Organization and Design, Morgan-Kaufmann Korea, 2005
  8. T. Mitchell, Machine Learning, McGraw-Hill Singapore, 1997
  9. C. Clausen and H. Wdchsler, 'Quad-Q-learning,' IEEE Trans. on Neural Network, vol. 11, pp. 279-294, 2000 https://doi.org/10.1109/72.839000
  10. H-U. Yoon, S-H. Whang, D-W. Kim, and K-B Sim, 'Strategy of cooperative behaviors of distributed autonomous robotic systems,' Proc. of 10th Int. Symp. on Artificial Life and Robotics, pp. 151-154, 2005
  11. H-U. Yoon and K-B. Sim, 'Hexagon-Based Q-Learning for Object Search with Multiple Robots,' Lecture Notes in Computer Science (LNCS) published in Springer, vol. 3612, pp. 713-222, 2005