• Title/Summary/Keyword: Q learning

Search Result 420, Processing Time 0.03 seconds

Improved Deep Q-Network Algorithm Using Self-Imitation Learning (Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘)

  • Sunwoo, Yung-Min;Lee, Won-Chang
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.644-649
    • /
    • 2021
  • Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning (강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현)

  • 박찬건;양성봉
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.672-680
    • /
    • 2003
  • A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.

Cooperative Robot for Table Balancing Using Q-learning (테이블 균형맞춤 작업이 가능한 Q-학습 기반 협력로봇 개발)

  • Kim, Yewon;Kang, Bo-Yeong
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.4
    • /
    • pp.404-412
    • /
    • 2020
  • Typically everyday human life tasks involve at least two people moving objects such as tables and beds, and the balancing of such object changes based on one person's action. However, many studies in previous work performed their tasks solely on robots without factoring human cooperation. Therefore, in this paper, we propose cooperative robot for table balancing using Q-learning that enables cooperative work between human and robot. The human's action is recognized in order to balance the table by the proposed robot whose camera takes the image of the table's state, and it performs the table-balancing action according to the recognized human action without high performance equipment. The classification of human action uses a deep learning technology, specifically AlexNet, and has an accuracy of 96.9% over 10-fold cross-validation. The experiment of Q-learning was carried out over 2,000 episodes with 200 trials. The overall results of the proposed Q-learning show that the Q function stably converged at this number of episodes. This stable convergence determined Q-learning policies for the robot actions. Video of the robotic cooperation with human over the table balancing task using the proposed Q-Learning can be found at http://ibot.knu.ac.kr/videocooperation.html.

Avoidance Behavior of Small Mobile Robots based on the Successive Q-Learning

  • Kim, Min-Soo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.164.1-164
    • /
    • 2001
  • Q-learning is a recent reinforcement learning algorithm that does not need a modeling of environment and it is a suitable approach to learn behaviors for autonomous agents. But when it is applied to multi-agent learning with many I/O states, it is usually too complex and slow. To overcome this problem in the multi-agent learning system, we propose the successive Q-learning algorithm. Successive Q-learning algorithm divides state-action pairs, which agents can have, into several Q-functions, so it can reduce complexity and calculation amounts. This algorithm is suitable for multi-agent learning in a dynamically changing environment. The proposed successive Q-learning algorithm is applied to the prey-predator problem with the one-prey and two-predators, and its effectiveness is verified from the efficient avoidance ability of the prey agent.

  • PDF

Region-based Q-learning for intelligent robot systems (지능형 로보트 시스템을 위한 영역기반 Q-learning)

  • Kim, Jae-Hyeon;Seo, Il-Hong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.3 no.4
    • /
    • pp.350-356
    • /
    • 1997
  • It is desirable for autonomous robot systems to possess the ability to behave in a smooth and continuous fashion when interacting with an unknown environment. Although Q-learning requires a lot of memory and time to optimize a series of actions in a continuous state space, it may not be easy to apply the method to such a real environment. In this paper, for continuous state space applications, to solve problem and a triangular type Q-value model\ulcorner This sounds very ackward. What is it you want to solve about the Q-value model. Our learning method can estimate a current Q-value by its relationship with the neighboring states and has the ability to learn its actions similar to that of Q-learning. Thus, our method can enable robots to move smoothly in a real environment. To show the validity of our method, navigation comparison with Q-learning are given and visual tracking simulation results involving an 2-DOF SCARA robot are also presented.

  • PDF

Dodecagon-based Q-learning Algorithm using SVM for Object Search of Robot (로봇의 목표물 추적을 위한 SVM과 12각형 기반의 Q-learning 알고리즘)

  • Seo, Sang-Wook;Jang, In-Hun;Sim, Kwee-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.227-230
    • /
    • 2007
  • 본 논문에서는 로봇의 목표물 추적을 위하여 SVM을 이용한 12각형 기반의 Q-learning 알고리즘을 제안한다. 제안한 알고리즘의 유효성을 보이기 위해 본 논문에서는 두 대의 로봇과 장애물 그리고 하나의 목표물로 정하고, 각각의 로봇이 숨겨진 목표물을 찾아내는 실험을 가정하여 무작위, DBAM과 AMAB의 융합 모델, 마지막으로는 본 논문에서 제안한 SVM과 12각형 기반의 Q-learning 알고리즘을 이용하여 실험을 수행하고, 이 3가지 방법을 비교하여 본 논문의 유효성을 검증하였다.

  • PDF

Q-learning for intersection traffic flow Control based on agents

  • Zhou, Xuan;Chong, Kil-To
    • Proceedings of the IEEK Conference
    • /
    • 2009.05a
    • /
    • pp.94-96
    • /
    • 2009
  • In this paper, we present the Q-learning method for adaptive traffic signal control on the basis of multi-agent technology. The structure is composed of sixphase agents and one intersection agent. Wireless communication network provides the possibility of the cooperation of agents. As one kind of reinforcement learning, Q-learning is adopted as the algorithm of the control mechanism, which can acquire optical control strategies from delayed reward; furthermore, we adopt dynamic learning method instead of static method, which is more practical. Simulation result indicates that it is more effective than traditional signal system.

  • PDF

Area-Based Q-learning Algorithm to Search Target Object of Multiple Robots (다수 로봇의 목표물 탐색을 위한 Area-Based Q-learning 알고리즘)

  • Yoon, Han-Ul;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.406-411
    • /
    • 2005
  • In this paper, we present the area-based Q-learning to search a target object using multiple robot. To search the target in Markovian space, the robots should recognize their surrounding at where they are located and generate some rules to act upon by themselves. Under area-based Q-learning, a robot, first of all, obtains 6-distances from itself to environment by infrared sensor which are hexagonally allocated around itself. Second, it calculates 6-areas with those distances then take an action, i.e., turn and move toward where the widest space will be guaranteed. After the action is taken, the value of Q will be updated by relative formula at the state. We set up an experimental environment with five small mobile robots, obstacles, and a target object, and tried to search for a target object while navigating in a unknown hallway where some obstacles were placed. In the end of this paper, we presents the results of three algorithms - a random search, area-based action making (ABAM), and hexagonal area-based Q-teaming.

Q-learning Using Influence Map (영향력 분포도를 이용한 Q-학습)

  • Sung Yun-Sick;Cho Kyung-Eun
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.5
    • /
    • pp.649-657
    • /
    • 2006
  • Reinforcement Learning is a computational approach to learning whereby an agent take an action which maximize the total amount of reward it receives among possible actions within current state when interacting with a uncertain environment. Q-learning, one of the most active algorithm in Reinforcement Learning, is consist of rewards which is obtained when an agent take an action. But it has the problem with mapping real world to discrete states. When state spaces are very large, Q-learning suffers from time for learning. In constant, when the state space is reduced, many state spaces map to single state space. Because an agent only learns single action within many states, an agent takes an action monotonously. In this paper, to reduce time for learning and complement simple action, we propose the Q-learning using influence map(QIM). By using influence map and adjacent state space's learning result, an agent could choose proper action within uncertain state where an agent does not learn. When this paper compares simulation results of QIM and Q-learning, we show that QIM effects as same as Q-learning even thought QIM uses 4.6% of the Q-learning's state spaces. This is because QIM learns faster than Q-learning about 2.77 times and the state spaces which is needed to learn is reduced, so the occurred problem is complemented by the influence map.

  • PDF

Area-Based Q-learning for Multiple Robots Control (다수 로봇 제어를 위한 면적 기반 Q-learning)

  • Yoon Han-Ul;Jang In-Hoon;Sim Kwee-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.04a
    • /
    • pp.198-201
    • /
    • 2005
  • 본 논문에서는 다수개의 로봇을 효율적으로 제어하기 위한 면적기반 Q-learning에 대해 논한다. 각 로봇은 $60^{\circ}$의 각을 이루도록 배치된 6개 센서를 가지고 있고 이를 통해 자신과 주변환경 사이의 거리를 센싱한다. 다음으로, 이 획득된 거리 데이터들로부터 6방향의 면적을 계산하여, 이후의 진행에 있어 보다 넓은 행동 반경을 보장해주는 영역으로 이동한다. 이 이동을 어떤 상태에서 다른 상태로의 전이로 간주, 이동 후 다시 6방향의 면적을 계산하여 이전 상태에서 현재 상태로의 행동에 대한 Q-Value를 업데이트 한다. 본 논문의 실험에서는 5개의 로봇을 이용해 장애물 사이에 숨어있는 물체를 찾아내는 것을 시도하였고, 3개의 서로 다른 제어 방법 - 랜덤 탐색, 면적 기반 탐색, 면적 기반 Q-learning 탐색 - 에 따른 결과를 나타내었다.

  • PDF