Search | Korea Science

Control for Manipulator of an Underwater Robot Using Meta Reinforcement Learning (메타강화학습을 이용한 수중로봇 매니퓰레이터 제어)

Moon, Ji-Youn;Moon, Jang-Hyuk;Bae, Sung-Hoon
- The Journal of the Korea institute of electronic communication sciences
- /
- v.16 no.1
- /
- pp.95-100
- /
- 2021
This paper introduces model-based meta reinforcement learning as a control for the manipulator of an underwater construction robot. Model-based meta reinforcement learning updates the model fast using recent experience in a real application and transfers the model to model predictive control which computes control inputs of the manipulator to reach the target position. The simulation environment for model-based meta reinforcement learning is established using MuJoCo and Gazebo. The real environment of manipulator control for underwater construction robot is set to deal with model uncertainties.
https://doi.org/10.13067/JKIECS.2021.16.1.95 인용 PDF KSCI

A Naive Bayesian-based Model of the Opponent's Policy for Efficient Multiagent Reinforcement Learning (효율적인 멀티 에이전트 강화 학습을 위한 나이브 베이지만 기반 상대 정책 모델)

Kwon, Ki-Duk
- Journal of Internet Computing and Services
- /
- v.9 no.6
- /
- pp.165-177
- /
- 2008
An important issue in Multiagent reinforcement learning is how an agent should learn its optimal policy in a dynamic environment where there exist other agents able to influence its own performance. Most previous works for Multiagent reinforcement learning tend to apply single-agent reinforcement learning techniques without any extensions or require some unrealistic assumptions even though they use explicit models of other agents. In this paper, a Naive Bayesian based policy model of the opponent agent is introduced and then the Multiagent reinforcement learning method using this model is explained. Unlike previous works, the proposed Multiagent reinforcement learning method utilizes the Naive Bayesian based policy model, not the Q function model of the opponent agent. Moreover, this learning method can improve learning efficiency by using a simpler one than other richer but time-consuming policy models such as Finite State Machines(FSM) and Markov chains. In this paper, the Cat and Mouse game is introduced as an adversarial Multiagent environment. And then effectiveness of the proposed Naive Bayesian based policy model is analyzed through experiments using this game as test-bed.
PDF

Two Circle-based Aircraft Head-on Reinforcement Learning Technique using Curriculum (커리큘럼을 이용한 투서클 기반 항공기 헤드온 공중 교전 강화학습 기법 연구)

Insu Hwang;Jungho Bae
- Journal of the Korea Institute of Military Science and Technology
- /
- v.26 no.4
- /
- pp.352-360
- /
- 2023
Recently, AI pilots using reinforcement learning are developing to a level that is more flexible than rule-based methods and can replace human pilots. In this paper, a curriculum was used to help head-on combat with reinforcement learning. It is not easy to learn head-on with a reinforcement learning method without a curriculum, but in this paper, through the two circle-based head-on air combat learning technique, ownship gradually increase the difficulty and become good at head-on combat. On the two-circle, the ATA angle between the ownship and target gradually increased and the AA angle gradually decreased while learning was conducted. By performing reinforcement learning with and w/o curriculum, it was engaged with the rule-based model. And as the win ratio of the curriculum based model increased to close to 100 %, it was confirmed that the performance was superior.
https://doi.org/10.9766/KIMST.2023.26.4.352 인용 PDF

A Study on Performance Improvement of Evolutionary Algorithms Using Reinforcement Learning (강화학습을 이용한 진화 알고리즘의 성능개선에 대한 연구)

이상환;심귀보
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 1998.10a
- /
- pp.420-426
- /
- 1998
Evolutionary algorithms are probabilistic optimization algorithms based on the model of natural evolution. Recently the efforts to improve the performance of evolutionary algorithms have been made extensively. In this paper, we introduce the research for improving the convergence rate and search faculty of evolution algorithms by using reinforcement learning. After providing an introduction to evolution algorithms and reinforcement learning, we present adaptive genetic algorithms, reinforcement genetic programming, and reinforcement evolution strategies which are combined with reinforcement learning. Adaptive genetic algorithms generate mutation probabilities of each locus by interacting with the environment according to reinforcement learning. Reinforcement genetic programming executes crossover and mutation operations based on reinforcement and inhibition mechanism of reinforcement learning. Reinforcement evolution strategies use the variances of fitness occurred by mutation to make the reinforcement signals which estimate and control the step length.
PDF

Credit-Assigned-CMAC-based Reinforcement Learning with application to the Acrobot Swing Up Control Problem (Acrobot Swing Up 제어를 위한 Credit-Assigned-CMAC 기반의 강화학습)

Shin, Yeon-Yong;Jang, Si-Young;Seo, Seung-Hwan;Suh, Il-Hong
- Proceedings of the KIEE Conference
- /
- 2003.11c
- /
- pp.621-624
- /
- 2003
For real world applications of reinforcement learning techniques, function approximation or generalization will be required to avoid curse of dimensionality. For this, an improved function approximation-based reinforcement learning method is proposed to speed up convergence by using CA-CMAC(Credit-Assigned Cerebellar Model Articulation Controller). To show that our proposed CACRL(CA-CMAC-based Reinforcement Learning) performs better than the CRL(CMAC-based Reinforcement Learning), computer simulation results are illustrated, where a swing-up control problem of an acrobot is considered.
PDF

Aspect-based Sentiment Analysis of Product Reviews using Multi-agent Deep Reinforcement Learning

M. Sivakumar;Srinivasulu Reddy Uyyala
- Asia pacific journal of information systems
- /
- v.32 no.2
- /
- pp.226-248
- /
- 2022
The existing model for sentiment analysis of product reviews learned from past data and new data was labeled based on training. But new data was never used by the existing system for making a decision. The proposed Aspect-based multi-agent Deep Reinforcement learning Sentiment Analysis (ADRSA) model learned from its very first data without the help of any training dataset and labeled a sentence with aspect category and sentiment polarity. It keeps on learning from the new data and updates its knowledge for improving its intelligence. The decision of the proposed system changed over time based on the new data. So, the accuracy of the sentiment analysis using deep reinforcement learning was improved over supervised learning and unsupervised learning methods. Hence, the sentiments of premium customers on a particular site can be explored to other customers effectively. A dynamic environment with a strong knowledge base can help the system to remember the sentences and usage State Action Reward State Action (SARSA) algorithm with Bidirectional Encoder Representations from Transformers (BERT) model improved the performance of the proposed system in terms of accuracy when compared to the state of art methods.
https://doi.org/10.14329/apjis.2022.32.2.226 인용 PDF

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
- Journal of IKEEE
- /
- v.23 no.4
- /
- pp.1150-1156
- /
- 2019
This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.
https://doi.org/10.7471/ikeee.2019.23.4.1150 인용 PDF KSCI

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
- Nuclear Engineering and Technology
- /
- v.54 no.9
- /
- pp.3283-3292
- /
- 2022
Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.
https://doi.org/10.1016/j.net.2022.04.014 인용 PDF KSCI

A UAV Spoofing Path Guided Simulation Using Reinforcement Learning (강화학습을 이용한 무인기 기만 경로 유도 시뮬레이션)

Jae-Kyung Koo;DongSun Lee;Chang-Ok Kang;Seungho Choi;Il Kyu Park
- Journal of Positioning, Navigation, and Timing
- /
- v.13 no.4
- /
- pp.497-503
- /
- 2024
In preparation for cases where Unmanned Aerial Vehicles (UAVs) are abused for surveillance or terrorism, this study proposes a technique to guide a UAV to a target point using a spoofing signal that interferes with the Global Navigation Satellite System (GNSS). However, the waypoint estimation-based approach used in spoofing requires repetitive computations, making real-time processing challenging and reducing its responsiveness to target point changes. This paper proposes a technique that uses reinforcement learning to guide UAV spoofing paths in real time by dynamically learning and adapting to changes in flight states without the need for waypoint estimation. To effectively learn real-time flight state change data, the Advantage Actor-Critic (A2C) reinforcement learning model is utilized. In the simulation, a spoofing path guided simulation that controls flight in real time through reinforcement learning was developed. The proposed reinforcement learning model was applied, and the reinforcement learning model was verified through a simulation experiment where the target point of guided spoofing was changed.
https://doi.org/10.11003/JPNT.2024.13.4.497 인용 PDF HTML

A Joint Allocation Algorithm of Computing and Communication Resources Based on Reinforcement Learning in MEC System

Liu, Qinghua;Li, Qingping
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.721-736
- /
- 2021
For the mobile edge computing (MEC) system supporting dense network, a joint allocation algorithm of computing and communication resources based on reinforcement learning is proposed. The energy consumption of task execution is defined as the maximum energy consumption of each user's task execution in the system. Considering the constraints of task unloading, power allocation, transmission rate and calculation resource allocation, the problem of joint task unloading and resource allocation is modeled as a problem of maximum task execution energy consumption minimization. As a mixed integer nonlinear programming problem, it is difficult to be directly solve by traditional optimization methods. This paper uses reinforcement learning algorithm to solve this problem. Then, the Markov decision-making process and the theoretical basis of reinforcement learning are introduced to provide a theoretical basis for the algorithm simulation experiment. Based on the algorithm of reinforcement learning and joint allocation of communication resources, the joint optimization of data task unloading and power control strategy is carried out for each terminal device, and the local computing model and task unloading model are built. The simulation results show that the total task computation cost of the proposed algorithm is 5%-10% less than that of the two comparison algorithms under the same task input. At the same time, the total task computation cost of the proposed algorithm is more than 5% less than that of the two new comparison algorithms.
https://doi.org/10.3745/JIPS.01.0079 인용 PDF KSCI

Search Result 167, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)