Search | Korea Science

A Performance Improvement Technique for Nash Q-learning using Macro-Actions (매크로 행동을 이용한 내시 Q-학습의 성능 향상 기법)

Sung, Yun-Sik;Cho, Kyun-Geun;Um, Ky-Hyun
- Journal of Korea Multimedia Society
- /
- v.11 no.3
- /
- pp.353-363
- /
- 2008
A multi-agent system has a longer learning period and larger state-spaces than a sin91e agent system. In this paper, we suggest a new method to reduce the learning time of Nash Q-learning in a multi-agent environment. We apply Macro-actions to Nash Q-learning to improve the teaming speed. In the Nash Q-teaming scheme, when agents select actions, rewards are accumulated like Macro-actions. In the experiments, we compare Nash Q-learning using Macro-actions with general Nash Q-learning. First, we observed how many times the agents achieve their goals. The results of this experiment show that agents using Nash Q-learning and 4 Macro-actions have 9.46% better performance than Nash Q-learning using only 4 primitive actions. Second, when agents use Macro-actions, Q-values are accumulated 2.6 times more. Finally, agents using Macro-actions select less actions about 44%. As a result, agents select fewer actions and Macro-actions improve the Q-value's update. It the agents' learning speeds improve.
PDF

A Function Approximation Method for Q-learning of Reinforcement Learning (강화학습의 Q-learning을 위한 함수근사 방법)

이영아;정태충
- Journal of KIISE:Software and Applications
- /
- v.31 no.11
- /
- pp.1431-1438
- /
- 2004
Reinforcement learning learns policies for accomplishing a task's goal by experience through interaction between agent and environment. Q-learning, basis algorithm of reinforcement learning, has the problem of curse of dimensionality and slow learning speed in the incipient stage of learning. In order to solve the problems of Q-learning, new function approximation methods suitable for reinforcement learning should be studied. In this paper, to improve these problems, we suggest Fuzzy Q-Map algorithm that is based on online fuzzy clustering. Fuzzy Q-Map is a function approximation method suitable to reinforcement learning that can do on-line teaming and express uncertainty of environment. We made an experiment on the mountain car problem with fuzzy Q-Map, and its results show that learning speed is accelerated in the incipient stage of learning.
PDF KSCI

Avoidance Behavior of Autonomous Mobile Robots using the Successive Q-learning (연속적인 Q-학습을 이용한 자율이동로봇의 회피행동 구현)

Kim, Min-Soo
- Proceedings of the KIEE Conference
- /
- 2001.07d
- /
- pp.2660-2662
- /
- 2001
Q-학습은 최근에 연구되는 강화학습으로서 환경에 대한 정의가 필요 없어 자율이동로봇의 행동학습에 적합한 방법이다. 그러나 다개체 시스템의 학습처럼 환경이 복잡해짐에 따라 개체의 입출력 변수는 늘어나게 되고 Q함수의 계산량은 기하급수적으로 증가하게 된다. 따라서 이러한 문제를 해결하기 위해 다개체 시스템의 Q-학습에 적합한 연속적인 Q-학습 알고리즘을 제안하였다. 연속적인 Q-학습 알고리즘은 개체가 가질 수 있는 모든 상태-행동 쌍을 하나의 Q함수에 표현하는 방법으로서 계산량 및 복잡성을 줄임으로써 동적으로 변하는 환경에 능동적으로 대처하도록 하였다. 제안한 연속적인 Q-학습 알고리즘을 벽으로 막힌 공간에서 두 포식자와 한 먹이로 구성되는 먹이-포식자 문제에 적용하여 먹이개체의 효율적인 회피능력을 검증하였다.
PDF

Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning (강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현)

박찬건;양성봉
- Journal of KIISE:Software and Applications
- /
- v.30 no.7_8
- /
- pp.672-680
- /
- 2003
A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.
PDF KSCI

Q-learning Using Influence Map (영향력 분포도를 이용한 Q-학습)

Sung Yun-Sick;Cho Kyung-Eun
- Journal of Korea Multimedia Society
- /
- v.9 no.5
- /
- pp.649-657
- /
- 2006
Reinforcement Learning is a computational approach to learning whereby an agent take an action which maximize the total amount of reward it receives among possible actions within current state when interacting with a uncertain environment. Q-learning, one of the most active algorithm in Reinforcement Learning, is consist of rewards which is obtained when an agent take an action. But it has the problem with mapping real world to discrete states. When state spaces are very large, Q-learning suffers from time for learning. In constant, when the state space is reduced, many state spaces map to single state space. Because an agent only learns single action within many states, an agent takes an action monotonously. In this paper, to reduce time for learning and complement simple action, we propose the Q-learning using influence map(QIM). By using influence map and adjacent state space's learning result, an agent could choose proper action within uncertain state where an agent does not learn. When this paper compares simulation results of QIM and Q-learning, we show that QIM effects as same as Q-learning even thought QIM uses 4.6% of the Q-learning's state spaces. This is because QIM learns faster than Q-learning about 2.77 times and the state spaces which is needed to learn is reduced, so the occurred problem is complemented by the influence map.
PDF

Multi Behavior Learning of Lamp Robot based on Q-learning (강화학습 Q-learning 기반 복수 행위 학습 램프 로봇)

Kwon, Ki-Hyeon;Lee, Hyung-Bong
- Journal of Digital Contents Society
- /
- v.19 no.1
- /
- pp.35-41
- /
- 2018
The Q-learning algorithm based on reinforcement learning is useful for learning the goal for one behavior at a time, using a combination of discrete states and actions. In order to learn multiple actions, applying a behavior-based architecture and using an appropriate behavior adjustment method can make a robot perform fast and reliable actions. Q-learning is a popular reinforcement learning method, and is used much for robot learning for its characteristics which are simple, convergent and little affected by the training environment (off-policy). In this paper, Q-learning algorithm is applied to a lamp robot to learn multiple behaviors (human recognition, desk object recognition). As the learning rate of Q-learning may affect the performance of the robot at the learning stage of multiple behaviors, we present the optimal multiple behaviors learning model by changing learning rate.
https://doi.org/10.9728/dcs.2018.19.1.35 인용 PDF KSCI

Reinforcement Learning based Dynamic Positioning of Robot Soccer Agents (강화학습에 기초한 로봇 축구 에이전트의 동적 위치 결정)

권기덕;김인철
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10b
- /
- pp.55-57
- /
- 2001
강화학습은 한 에이전트가 자신이 놓여진 환경으로부터의 보상을 최대화할 수 있는 최적의 행동 전략을 학습하는 것이다. 따라서 강화학습은 입력(상태)과 출력(행동)의 쌍으로 명확한 훈련 예들이 제공되는 교사 학습과는 다르다. 특히 Q-학습과 같은 비 모델 기반(model-free)의 강화학습은 사전에 환경에 대한 별다른 모델을 설정하거나 학습할 필요가 없으며 다양한 상태와 행동들을 충분히 자주 경험할 수만 있으면 최적의 행동전략에 도달할 수 있어 다양한 응용분야에 적용되고 있다. 하지만 실제 응용분야에서 Q-학습과 같은 강화학습이 겪는 최대의 문제는 큰 상태 공간을 갖는 문제의 경우에는 적절한 시간 내에 각 상태와 행동들에 대한 최적의 Q값에 수렴할 수 없어 효과를 거두기 어렵다는 점이다. 이런 문제점을 고려하여 본 논문에서는 로봇 축구 시뮬레이션 환경에서 각 선수 에이전트의 동적 위치 결정을 위해 효과적인 새로운 Q-학습 방법을 제안한다. 이 방법은 원래 문제의 상태공간을 몇 개의 작은 모듈들로 나누고 이들의 개별적인 Q-학습 결과를 단순히 결합하는 종래의 모듈화 Q-학습(Modular Q-Learning)을 개선하여, 보상에 끼친 각 모듈의 기여도에 따라 모듈들의 학습결과를 적응적으로 결합하는 방법이다. 이와 같은 적응적 중재에 기초한 모듈화 Q-학습법(Adaptive Mediation based Modular Q-Learning, AMMQL)은 종래의 모듈화 Q-학습법의 장점과 마찬가지로 큰 상태공간의 문제를 해결할 수 있을 뿐 아니라 보다 동적인 환경변화에 유연하게 적응하여 새로운 행동 전략을 학습할 수 있다는 장점을 추가로 가질 수 있다. 이러한 특성을 지닌 AMMQL 학습법은 로봇축구와 같이 끊임없이 실시간적으로 변화가 일어나는 다중 에이전트 환경에서 특히 높은 효과를 볼 수 있다. 본 논문에서는 AMMQL 학습방법의 개념을 소개하고, 로봇축구 에이전트의 동적 위치 결정을 위한 학습에 어떻게 이 학습방법을 적용할 수 있는지 세부 설계를 제시한다.
PDF

Real-Time Path Planning for Mobile Robots Using Q-Learning (Q-learning을 이용한 이동 로봇의 실시간 경로 계획)

Kim, Ho-Won;Lee, Won-Chang
- Journal of IKEEE
- /
- v.24 no.4
- /
- pp.991-997
- /
- 2020
Reinforcement learning has been applied mainly in sequential decision-making problems. Especially in recent years, reinforcement learning combined with neural networks has brought successful results in previously unsolved fields. However, reinforcement learning using deep neural networks has the disadvantage that it is too complex for immediate use in the field. In this paper, we implemented path planning algorithm for mobile robots using Q-learning, one of the easy-to-learn reinforcement learning algorithms. We used real-time Q-learning to update the Q-table in real-time since the Q-learning method of generating Q-tables in advance has obvious limitations. By adjusting the exploration strategy, we were able to obtain the learning speed required for real-time Q-learning. Finally, we compared the performance of real-time Q-learning and DQN.
https://doi.org/10.7471/ikeee.2020.24.4.991 인용 PDF KSCI

Neural -Q met,hod based on $\varepsilon$-SVR ($\varepsilon$-SVR을 이용한 Neural-Q 기법)

조원희;김영일;박주영
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2002.12a
- /
- pp.162-165
- /
- 2002
Q-learning은 강화학습의 한 방법으로서, 여러 분야에 널리 응용되고 있는 기법이다. 최근에는 Linear Quadratic Regulation(이하 LQR) 문제에 성공적으로 적용된 바 있는데, 특히, 시스템모델의 파라미터에 대한 구체적인 정보가 없는 상태에서 적절한 입력과 출력만을 가지고 학습을 통해 문제를 해결할 수 있어서 상황에 따라서 매우 실용적인 대안이 될 수 있다. Neural Q-learning은 이러한 Q-learning의 Q-value를 MLP(multilayer perceptron) 신경망의 출력으로 대치시킴으로써, 비선형 시스템의 최적제어 문제를 다룰 수 있게 한 방법이다. 그러나, Neural Q방식은 신경망의 구조를 먼저 결정한 후 역전파 알고리즘을 이용하여 학습하는 절차를 취하기 때문에, 시행착오를 통하여 신경망 구조를 결정해야 한다는 점, 역전파 알고리즘의 적용으로 인해 신경망의 연결강도 값들이 지역적 최적해로 수렴한다는 점등의 문제점을 상속받는 한계가 있다. 따라서, 본 논문에서는 Neural-0 학습의 도구로, 역전파 알고리즘으로 학습되는 MLP 신경망을 사용하는 대신 최근 들어 여러 분야에서 그 성능을 인정받고 있는 서포트 벡터 학습법을 사용하는 방법을 택하여, $\varepsilon$-SVR(Epsilon Support Vector Regression)을 이용한 Q-value 근사 기법을 제안하고 관련 수식을 유도하였다. 그리고, 모의 실험을 통하여, 제안된 서포트 벡터학습 기반 Neural-Q 방법의 적용 가능성을 알아보았다.

Function Approximation for accelerating learning speed in Reinforcement Learning (강화학습의 학습 가속을 위한 함수 근사 방법)

Lee, Young-Ah;Chung, Tae-Choong
- Journal of the Korean Institute of Intelligent Systems
- /
- v.13 no.6
- /
- pp.635-642
- /
- 2003
Reinforcement learning got successful results in a lot of applications such as control and scheduling. Various function approximation methods have been studied in order to improve the learning speed and to solve the shortage of storage in the standard reinforcement learning algorithm of Q-Learning. Most function approximation methods remove some special quality of reinforcement learning and need prior knowledge and preprocessing. Fuzzy Q-Learning needs preprocessing to define fuzzy variables and Local Weighted Regression uses training examples. In this paper, we propose a function approximation method, Fuzzy Q-Map that is based on on-line fuzzy clustering. Fuzzy Q-Map classifies a query state and predicts a suitable action according to the membership degree. We applied the Fuzzy Q-Map, CMAC and LWR to the mountain car problem. Fuzzy Q-Map reached the optimal prediction rate faster than CMAC and the lower prediction rate was seen than LWR that uses training example.
https://doi.org/10.5391/JKIIS.2003.13.6.635 인용 PDF KSCI

Search Result 288, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)