Search | Korea Science

Policy Modeling for Efficient Reinforcement Learning in Adversarial Multi-Agent Environments (적대적 멀티 에이전트 환경에서 효율적인 강화 학습을 위한 정책 모델링)

Kwon, Ki-Duk;Kim, In-Cheol
- Journal of KIISE:Software and Applications
- /
- v.35 no.3
- /
- pp.179-188
- /
- 2008
An important issue in multiagent reinforcement learning is how an agent should team its optimal policy through trial-and-error interactions in a dynamic environment where there exist other agents able to influence its own performance. Most previous works for multiagent reinforcement teaming tend to apply single-agent reinforcement learning techniques without any extensions or are based upon some unrealistic assumptions even though they build and use explicit models of other agents. In this paper, basic concepts that constitute the common foundation of multiagent reinforcement learning techniques are first formulated, and then, based on these concepts, previous works are compared in terms of characteristics and limitations. After that, a policy model of the opponent agent and a new multiagent reinforcement learning method using this model are introduced. Unlike previous works, the proposed multiagent reinforcement learning method utilize a policy model instead of the Q function model of the opponent agent. Moreover, this learning method can improve learning efficiency by using a simpler one than other richer but time-consuming policy models such as Finite State Machines(FSM) and Markov chains. In this paper. the Cat and Mouse game is introduced as an adversarial multiagent environment. And effectiveness of the proposed multiagent reinforcement learning method is analyzed through experiments using this game as testbed.
PDF KSCI

The Effects of a Peer Agent on Achievement and Self-Efficacy in Programming Education (프로그래밍 교육에서 동료 에이전트가 학업성취도와 자기효능감에 미치는 영향)

Han, Keun-Woo;Lee, Eun-Kyoung;Lee, Young-Jun
- The Journal of Korean Association of Computer Education
- /
- v.10 no.5
- /
- pp.43-51
- /
- 2007
We have developed a peer agent to support programming learning and analyzed its educational effects in a programming course. The agent acts as a tutor or a tutee. The role of a tutor/tutee is like the role of a navigator/driver in pair programming. While students are learning with the peer agent, the students' programming abilities are modeled. Based on the student's model, the peer agent provides appropriate feedbacks and contents to the learner. The peer agent gives positive effects on learners' achievement and self-efficacy in a programming course. It means that the peer agent system helps the learner in an affective domain as well as a cognitive domain.
PDF

A Study of Communication between Multi-Agents for Web Based Collaborative Learning (웹기반 협력 학습을 위한 멀티에이전트간의 통신에 관한 연구)

Lee, Chul-Hwan;Han, Sun-Gwan
- Journal of The Korean Association of Information Education
- /
- v.3 no.2
- /
- pp.41-53
- /
- 2000
The purpose of the paper is communication between multi-agents for student's learning at web based collaborative learning. First, this study investigated the whole contents and characteristics of an agent system and discussed KQML, communication language between multi-agents. Also, we suggested architecture of an agent based system for collaborative learning and interaction method between agents using KQML. We design어 and implemented collaborative learning system using Java programming language, and we also demonstrate the efficiency of collaborative learning system by communication between multi-agents through experiments.
PDF

Labeling Q-Learning for Maze Problems with Partially Observable States

Lee, Hae-Yeon;Hiroyuki Kamaya;Kenich Abe
- 제어로봇시스템학회:학술대회논문집
- /
- 2000.10a
- /
- pp.489-489
- /
- 2000
Recently, Reinforcement Learning(RL) methods have been used far teaming problems in Partially Observable Markov Decision Process(POMDP) environments. Conventional RL-methods, however, have limited applicability to POMDP To overcome the partial observability, several algorithms were proposed [5], [7]. The aim of this paper is to extend our previous algorithm for POMDP, called Labeling Q-learning(LQ-learning), which reinforces incomplete information of perception with labeling. Namely, in the LQ-learning, the agent percepts the current states by pair of observation and its label, and the agent can distinguish states, which look as same, more exactly. Labeling is carried out by a hash-like function, which we call Labeling Function(LF). Numerous labeling functions can be considered, but in this paper, we will introduce several labeling functions based on only 2 or 3 immediate past sequential observations. We introduce the basic idea of LQ-learning briefly, apply it to maze problems, simple POMDP environments, and show its availability with empirical results, look better than conventional RL algorithms.
PDF

Study on Enhancing Training Efficiency of MARL for Swarm Using Transfer Learning (전이학습을 활용한 군집제어용 강화학습의 효율 향상 방안에 관한 연구)

Seulgi Yi;Kwon-Il Kim;Sukmin Yoon
- Journal of the Korea Institute of Military Science and Technology
- /
- v.26 no.4
- /
- pp.361-370
- /
- 2023
Swarm has recently become a critical component of offensive and defensive systems. Multi-agent reinforcement learning(MARL) empowers swarm systems to handle a wide range of scenarios. However, the main challenge lies in MARL's scalability issue - as the number of agents increases, the performance of the learning decreases. In this study, transfer learning is applied to advanced MARL algorithm to resolve the scalability issue. Validation results show that the training efficiency has significantly improved, reducing computational time by 31 %.
https://doi.org/10.9766/KIMST.2023.26.4.361 인용 PDF

Stochastic Initial States Randomization Method for Robust Knowledge Transfer in Multi-Agent Reinforcement Learning (멀티에이전트 강화학습에서 견고한 지식 전이를 위한 확률적 초기 상태 랜덤화 기법 연구)

Dohyun Kim;Jungho Bae
- Journal of the Korea Institute of Military Science and Technology
- /
- v.27 no.4
- /
- pp.474-484
- /
- 2024
Reinforcement learning, which are also studied in the field of defense, face the problem of sample efficiency, which requires a large amount of data to train. Transfer learning has been introduced to address this problem, but its effectiveness is sometimes marginal because the model does not effectively leverage prior knowledge. In this study, we propose a stochastic initial state randomization(SISR) method to enable robust knowledge transfer that promote generalized and sufficient knowledge transfer. We developed a simulation environment involving a cooperative robot transportation task. Experimental results show that successful tasks are achieved when SISR is applied, while tasks fail when SISR is not applied. We also analyzed how the amount of state information collected by the agents changes with the application of SISR.
https://doi.org/10.9766/KIMST.2024.27.4.474 인용 PDF

(e-commerce Agents using Reinforcement Learning) (강화 학습을 이용한 전자 상거래 에이전트)

윤지현;김일곤
- Journal of KIISE:Software and Applications
- /
- v.30 no.5_6
- /
- pp.579-586
- /
- 2003
Agents are well fitted to e-commerce applicable area because they pursuit an autonomy and interact with dynamic environment. In this paper we propose an e-commerce agents using reinforcement learning. We modify a reinforcement teaming algorithm for agents to have an intelligent feature and to make a transaction as practical business body in behalf of a person. To show the validity of this approach, we classify agents into buying agents and soiling agents, give characters of level according to the degree of learning and communication. Finally we implement an e-commerce framework and show the result. In this paper we show a design of e-commerce agents which is based on the proposed learning algorithm and present that the agents have enough possibility of doing a transaction in practical e-commerce.
PDF KSCI

Mean Field Game based Reinforcement Learning for Weapon-Target Assignment (평균 필드 게임 기반의 강화학습을 통한 무기-표적 할당)

Shin, Min Kyu;Park, Soon-Seo;Lee, Daniel;Choi, Han-Lim
- Journal of the Korea Institute of Military Science and Technology
- /
- v.23 no.4
- /
- pp.337-345
- /
- 2020
The Weapon-Target Assignment(WTA) problem can be formulated as an optimization problem that minimize the threat of targets. Existing methods consider the trade-off between optimality and execution time to meet the various mission objectives. We propose a multi-agent reinforcement learning algorithm for WTA based on mean field game to solve the problem in real-time with nearly optimal accuracy. Mean field game is a recent method introduced to relieve the curse of dimensionality in multi-agent learning algorithm. In addition, previous reinforcement learning models for WTA generally do not consider weapon interference, which may be critical in real world operations. Therefore, we modify the reward function to discourage the crossing of weapon trajectories. The feasibility of the proposed method was verified through simulation of a WTA problem with multiple targets in realtime and the proposed algorithm can assign the weapons to all targets without crossing trajectories of weapons.
https://doi.org/10.9766/KIMST.2020.23.4.337 인용 PDF KSCI

Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD (스마트 TMD 제어를 위한 강화학습 알고리즘 성능 검토)

Kang, Joo-Won;Kim, Hyun-Su
- Journal of Korean Association for Spatial Structures
- /
- v.21 no.2
- /
- pp.41-48
- /
- 2021
A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.
https://doi.org/10.9712/KASS.2021.21.2.41 인용 PDF KSCI

Design intelligent web-agent system using learning method (학습 방법을 이용한 지능형 웹 에이전트 시스템 설계)

이말례;남태우
- Journal of the Korean Society for information Management
- /
- v.14 no.2
- /
- pp.285-301
- /
- 1997
Massive amount ofinformation is provided for the internet users. Therfore, the users are exposed even to the useless information. In this paper, a Intelligent Web-Agent system is present as a solution for this kind if users inconvinience. This Intelligent Web-Agent system i devised users to search by the keyword about which they get information and commend the sites which have more intensive relation with the examine keyword, judge by the users and the case-base constructed by the Intelligent Web-Agent system itself previously, so the users can access the essential web sites in short time. Intelligent Web-Agent system is compose of a interface-system and a learning system. According to the experiment, using the Intelligent Web-Agent System quicker than the case when not using the Intelligent Web-Agent System.
PDF

Search Result 448, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)