• Title/Summary/Keyword: 멀티 에이 전트 강화학습

Search Result 20, Processing Time 0.023 seconds

RBFN-based Policy Model for Efficient Multiagent Reinforcement Learning (효율적인 멀티 에이전트 강화학습을 위한 RBFN 기반 정책 모델)

  • Gwon, Gi-Deok;Kim, In-Cheol
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.11a
    • /
    • pp.294-302
    • /
    • 2007
  • 멀티 에이전트 강화학습에서 중요한 이슈 중의 하나는 자신의 성능에 영향을 미칠 수 있는 다른 에이전트들이 존재하는 동적 환경에서 어떻게 최적의 행동 정책을 학습하느냐 하는 것이다. 멀티 에이전트 강화 학습을 위한 기존 연구들은 대부분 단일 에이전트 강화 학습기법들을 큰 변화 없이 그대로 적용하거나 비록 다른 에이전트에 관한 별도의 모델을 이용하더라도 현실적이지 못한 가정들을 요구한다. 본 논문에서는 상대 에이전트에 대한RBFN기반의 행동 정책 모델을 소개한 뒤, 이것을 이용한 강화 학습 방법을 설명한다. 본 논문에서는 제안하는 멀티 에이전트 강화학습 방법은 기존의 멀티 에이전트 강화 학습 연구들과는 달리 상대 에이전트의 Q 평가 함수 모델이 아니라 RBFN 기반의 행동 정책 모델을 학습한다. 또한, 표현력은 풍부하나 학습에 시간과 노력이 많이 요구되는 유한 상태 오토마타나 마코프 체인과 같은 행동 정책 모델들에 비해 비교적 간단한 형태의 행동 정책 모델을 이용함으로써 학습의 효율성을 높였다. 본 논문에서는 대표적이 절대적 멀티 에이전트 환경인 고양이와 쥐 게임을 소개한 뒤, 이 게임을 테스트 베드 삼아 실험들을 전개함으로써 제안하는 RBFN 기반의 정책 모델의 효과를 분석해본다.

  • PDF

A Coordination Agent Model based on Extracting Similar Information (유사 정보 추출에 기반한 조성 에이전트 모델)

  • 양소진;이현수;오경환
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.1_2
    • /
    • pp.55-63
    • /
    • 2001
  • Speaking generally, agent-based technology is a kind of technology to handle the flood of information resulted from the popularization of the internet. Agent system is a multi-distributed system which consists of both homogeneous and heterogeneous agents. Generally there is a coordination agent in between which is in charge of control and m message flow among the application agents. The purpose of this thesis is to propose a coordination method among agents, some of which provide informations and some of which request them. In multi-agent system, the Information Providing Agent(IPA) registers its capabilities to Coordination Agent(CA) and the Information Requesting Agent(lRA) requests CA what it needs. To coordinate them with satisfactory results the coordination agent ought to have an ability to return a relatively proper data to the requester which is supposed to be similar even though it is not so exact as was intended. For this, this thesis proposes a scheme for an coordination agent to find an IPA which provides an information which correlates most closely with that of IRA.

  • PDF

Policy Modeling for Efficient Reinforcement Learning in Adversarial Multi-Agent Environments (적대적 멀티 에이전트 환경에서 효율적인 강화 학습을 위한 정책 모델링)

  • Kwon, Ki-Duk;Kim, In-Cheol
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.3
    • /
    • pp.179-188
    • /
    • 2008
  • An important issue in multiagent reinforcement learning is how an agent should team its optimal policy through trial-and-error interactions in a dynamic environment where there exist other agents able to influence its own performance. Most previous works for multiagent reinforcement teaming tend to apply single-agent reinforcement learning techniques without any extensions or are based upon some unrealistic assumptions even though they build and use explicit models of other agents. In this paper, basic concepts that constitute the common foundation of multiagent reinforcement learning techniques are first formulated, and then, based on these concepts, previous works are compared in terms of characteristics and limitations. After that, a policy model of the opponent agent and a new multiagent reinforcement learning method using this model are introduced. Unlike previous works, the proposed multiagent reinforcement learning method utilize a policy model instead of the Q function model of the opponent agent. Moreover, this learning method can improve learning efficiency by using a simpler one than other richer but time-consuming policy models such as Finite State Machines(FSM) and Markov chains. In this paper. the Cat and Mouse game is introduced as an adversarial multiagent environment. And effectiveness of the proposed multiagent reinforcement learning method is analyzed through experiments using this game as testbed.

Multi-Agent Reinforcement Learning Model based on Fuzzy Inference (퍼지 추론 기반의 멀티에이전트 강화학습 모델)

  • Lee, Bong-Keun;Chung, Jae-Du;Ryu, Keun-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.10
    • /
    • pp.51-58
    • /
    • 2009
  • Reinforcement learning is a sub area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. In the case of multi-agent, especially, which state space and action space gets very enormous in compared to single agent, so it needs to take most effective measure available select the action strategy for effective reinforcement learning. This paper proposes a multi-agent reinforcement learning model based on fuzzy inference system in order to improve learning collect speed and select an effective action in multi-agent. This paper verifies an effective action select strategy through evaluation tests based on Robocup Keepaway which is one of useful test-beds for multi-agent. Our proposed model can apply to evaluate efficiency of the various intelligent multi-agents and also can apply to strategy and tactics of robot soccer system.

A Naive Bayesian-based Model of the Opponent's Policy for Efficient Multiagent Reinforcement Learning (효율적인 멀티 에이전트 강화 학습을 위한 나이브 베이지만 기반 상대 정책 모델)

  • Kwon, Ki-Duk
    • Journal of Internet Computing and Services
    • /
    • v.9 no.6
    • /
    • pp.165-177
    • /
    • 2008
  • An important issue in Multiagent reinforcement learning is how an agent should learn its optimal policy in a dynamic environment where there exist other agents able to influence its own performance. Most previous works for Multiagent reinforcement learning tend to apply single-agent reinforcement learning techniques without any extensions or require some unrealistic assumptions even though they use explicit models of other agents. In this paper, a Naive Bayesian based policy model of the opponent agent is introduced and then the Multiagent reinforcement learning method using this model is explained. Unlike previous works, the proposed Multiagent reinforcement learning method utilizes the Naive Bayesian based policy model, not the Q function model of the opponent agent. Moreover, this learning method can improve learning efficiency by using a simpler one than other richer but time-consuming policy models such as Finite State Machines(FSM) and Markov chains. In this paper, the Cat and Mouse game is introduced as an adversarial Multiagent environment. And then effectiveness of the proposed Naive Bayesian based policy model is analyzed through experiments using this game as test-bed.

  • PDF

Developing artificial football agents based upon multi-agent techniques in the AI world cup (AI World Cup 환경을 이용한 멀티 에이전트 기반 지능형 가상 축구 에이전트 구현)

  • Lee, Eunhoo;Seong, Hyeon-ah;Jung, Minji;Lee, Hye-in;Joung, Jinoo;Lee, Eui Chul;Lee, Jee Hang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.819-822
    • /
    • 2021
  • AI World Cup 환경은 다수 가상 에이전트들이 팀을 이뤄서 서로 상호작용하며 대전이 가능한 가상 축구 환경이다. 본 논문에서는 AI World Cup 환경에서 멀티 에이전트기반 학습/추론 기술을 사용하여 다양한 전략과 전술을 구사하는 가상 축구 에이전트 구현과 시뮬레이션 결과를 소개한다. 먼저, 역할을 바탕으로 협동하여 상대방과 대전할 수 있는 논리 기반 추론형 멀티 에이전트 기술이 적용된 Dynamic planning 축구 에이전트 9 세트를 구현하였다. 이후, 강화학습 에이전트 기반, 단일 에이전트를 조합한 Independent Q-Learning 방식의 학습형 축구 에이전트를 구현한 후, 이를 멀티 에이전트 강화학습으로 확장하여 역할 기반 전략 학습이 가능한 가상 축구 에이전트를 구현하고 시뮬레이션 하였다. 구현된 가상 축구 에이전트들 간 대전을 통해 승률을 확인하고, 전략의 우수성을 분석하였다. 시뮬레이션 예제는 다음에서 확인할 수 있다 (https://github.com/I-hate-Soccer/Simulation).

A Performance Improvement Technique for Nash Q-learning using Macro-Actions (매크로 행동을 이용한 내시 Q-학습의 성능 향상 기법)

  • Sung, Yun-Sik;Cho, Kyun-Geun;Um, Ky-Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.3
    • /
    • pp.353-363
    • /
    • 2008
  • A multi-agent system has a longer learning period and larger state-spaces than a sin91e agent system. In this paper, we suggest a new method to reduce the learning time of Nash Q-learning in a multi-agent environment. We apply Macro-actions to Nash Q-learning to improve the teaming speed. In the Nash Q-teaming scheme, when agents select actions, rewards are accumulated like Macro-actions. In the experiments, we compare Nash Q-learning using Macro-actions with general Nash Q-learning. First, we observed how many times the agents achieve their goals. The results of this experiment show that agents using Nash Q-learning and 4 Macro-actions have 9.46% better performance than Nash Q-learning using only 4 primitive actions. Second, when agents use Macro-actions, Q-values are accumulated 2.6 times more. Finally, agents using Macro-actions select less actions about 44%. As a result, agents select fewer actions and Macro-actions improve the Q-value's update. It the agents' learning speeds improve.

  • PDF

Comparison of Deep Learning Activation Functions for Performance Improvement of a 2D Shooting Game Learning Agent (2D 슈팅 게임 학습 에이전트의 성능 향상을 위한 딥러닝 활성화 함수 비교 분석)

  • Lee, Dongcheul;Park, Byungjoo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.2
    • /
    • pp.135-141
    • /
    • 2019
  • Recently, there has been active researches about building an artificial intelligence agent that can learn how to play a game by using re-enforcement learning. The performance of the learning can be diverse according to what kinds of deep learning activation functions they used when they train the agent. This paper compares the activation functions when we train our agent for learning how to play a 2D shooting game by using re-enforcement learning. We defined performance metrics to analyze the results and plotted them along a training time. As a result, we found ELU (Exponential Linear Unit) with a parameter 1.0 achieved best rewards than other activation functions. There was 23.6% gap between the best activation function and the worst activation function.

Multi-Agent Control Strategy using Reinforcement Leaning (강화학습을 이용한 다중 에이전트 제어 전략)

  • 이형일
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.5
    • /
    • pp.937-944
    • /
    • 2003
  • The most important problems in the multi-agent system are to accomplish a gnat through the efficient coordination of several agents and to prevent collision with other agents. In this paper, we propose a new control strategy for succeeding the goal of a prey pursuit problem efficiently Our control method uses reinforcement learning to control the multi-agent system and consider the distance as well as the space relationship among the agents in the state space of the prey pursuit problem.

  • PDF

Q-learning Using Influence Map (영향력 분포도를 이용한 Q-학습)

  • Sung Yun-Sick;Cho Kyung-Eun
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.5
    • /
    • pp.649-657
    • /
    • 2006
  • Reinforcement Learning is a computational approach to learning whereby an agent take an action which maximize the total amount of reward it receives among possible actions within current state when interacting with a uncertain environment. Q-learning, one of the most active algorithm in Reinforcement Learning, is consist of rewards which is obtained when an agent take an action. But it has the problem with mapping real world to discrete states. When state spaces are very large, Q-learning suffers from time for learning. In constant, when the state space is reduced, many state spaces map to single state space. Because an agent only learns single action within many states, an agent takes an action monotonously. In this paper, to reduce time for learning and complement simple action, we propose the Q-learning using influence map(QIM). By using influence map and adjacent state space's learning result, an agent could choose proper action within uncertain state where an agent does not learn. When this paper compares simulation results of QIM and Q-learning, we show that QIM effects as same as Q-learning even thought QIM uses 4.6% of the Q-learning's state spaces. This is because QIM learns faster than Q-learning about 2.77 times and the state spaces which is needed to learn is reduced, so the occurred problem is complemented by the influence map.

  • PDF