Search | Korea Science

Controller Learning Method of Self-driving Bicycle Using State-of-the-art Deep Reinforcement Learning Algorithms

Choi, Seung-Yoon;Le, Tuyen Pham;Chung, Tae-Choong
- Journal of the Korea Society of Computer and Information
- /
- v.23 no.10
- /
- pp.23-31
- /
- 2018
Recently, there have been many studies on machine learning. Among them, studies on reinforcement learning are actively worked. In this study, we propose a controller to control bicycle using DDPG (Deep Deterministic Policy Gradient) algorithm which is the latest deep reinforcement learning method. In this paper, we redefine the compensation function of bicycle dynamics and neural network to learn agents. When using the proposed method for data learning and control, it is possible to perform the function of not allowing the bicycle to fall over and reach the further given destination unlike the existing method. For the performance evaluation, we have experimented that the proposed algorithm works in various environments such as fixed speed, random, target point, and not determined. Finally, as a result, it is confirmed that the proposed algorithm shows better performance than the conventional neural network algorithms NAF and PPO.
https://doi.org/10.9708/jksci.2018.23.10.023 인용 PDF KSCI

A Joint Allocation Algorithm of Computing and Communication Resources Based on Reinforcement Learning in MEC System

Liu, Qinghua;Li, Qingping
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.721-736
- /
- 2021
For the mobile edge computing (MEC) system supporting dense network, a joint allocation algorithm of computing and communication resources based on reinforcement learning is proposed. The energy consumption of task execution is defined as the maximum energy consumption of each user's task execution in the system. Considering the constraints of task unloading, power allocation, transmission rate and calculation resource allocation, the problem of joint task unloading and resource allocation is modeled as a problem of maximum task execution energy consumption minimization. As a mixed integer nonlinear programming problem, it is difficult to be directly solve by traditional optimization methods. This paper uses reinforcement learning algorithm to solve this problem. Then, the Markov decision-making process and the theoretical basis of reinforcement learning are introduced to provide a theoretical basis for the algorithm simulation experiment. Based on the algorithm of reinforcement learning and joint allocation of communication resources, the joint optimization of data task unloading and power control strategy is carried out for each terminal device, and the local computing model and task unloading model are built. The simulation results show that the total task computation cost of the proposed algorithm is 5%-10% less than that of the two comparison algorithms under the same task input. At the same time, the total task computation cost of the proposed algorithm is more than 5% less than that of the two new comparison algorithms.
https://doi.org/10.3745/JIPS.01.0079 인용 PDF KSCI

UAV Path Planning based on Deep Reinforcement Learning using Cell Decomposition Algorithm (셀 분해 알고리즘을 활용한 심층 강화학습 기반 무인 항공기 경로 계획)

Kyoung-Hun Kim;Byungsun Hwang;Joonho Seon;Soo-Hyun Kim;Jin-Young Kim
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.24 no.3
- /
- pp.15-20
- /
- 2024
Path planning for unmanned aerial vehicles (UAV) is crucial in avoiding collisions with obstacles in complex environments that include both static and dynamic obstacles. Path planning algorithms like RRT and A^* are effectively handle static obstacle avoidance but have limitations with increasing computational complexity in high-dimensional environments. Reinforcement learning-based algorithms can accommodate complex environments, but like traditional path planning algorithms, they struggle with training complexity and convergence in higher-dimensional environment. In this paper, we proposed a reinforcement learning model utilizing a cell decomposition algorithm. The proposed model reduces the complexity of the environment by decomposing the learning environment in detail, and improves the obstacle avoidance performance by establishing the valid action of the agent. This solves the exploration problem of reinforcement learning and improves the convergence of learning. Simulation results show that the proposed model improves learning speed and efficient path planning compared to reinforcement learning models in general environments.
https://doi.org/10.7236/JIIBC.2024.24.3.15 인용 PDF HTML

Kernel-based actor-critic approach with applications

Chu, Baek-Suk;Jung, Keun-Woo;Park, Joo-Young
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.4
- /
- pp.267-274
- /
- 2011
Recently, actor-critic methods have drawn significant interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy. In this paper, we consider a new type of actor-critic algorithms employing the kernel methods, which have recently shown to be very effective tools in the various fields of machine learning, and have performed investigations on combining the actor-critic strategy together with kernel methods. More specifically, this paper studies actor-critic algorithms utilizing the kernel-based least-squares estimation and policy gradient, and in its critic's part, the study uses a sliding-window-based kernel least-squares method, which leads to a fast and efficient value-function-estimation in a nonparametric setting. The applicability of the considered algorithms is illustrated via a robot locomotion problem and a tunnel ventilation control problem.
https://doi.org/10.5391/IJFIS.2011.11.4.267 인용 PDF KSCI

Goal-Directed Reinforcement Learning System (목표지향적 강화학습 시스템)

Lee, Chang-Hoon
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.10 no.5
- /
- pp.265-270
- /
- 2010
Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like TD-learning and TD(${\lambda}$)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present GDRLS algorithm for finding the shortest path faster in a maze environment. GDRLS is select the candidate states that can guide the shortest path in maze environment, and learn only the candidate states to find the shortest path. Through experiments, we can see that GDRLS can search the shortest path faster than TD-learning and TD(${\lambda}$)-learning in maze environment.
PDF KSCI

Adapative Modular Q-Learning for Agents´ Dynamic Positioning in Robot Soccer Simulation

Kwon, Ki-Duk;Kim, In-Cheol
- 제어로봇시스템학회:학술대회논문집
- /
- 2001.10a
- /
- pp.149.5-149
- /
- 2001
The robot soccer simulation game is a dynamic multi-agent environment. In this paper we suggest a new reinforcement learning approach to each agent´s dynamic positioning in such dynamic environment. Reinforcement learning is the machine learning in which an agent learns from indirect, delayed reward an optimal policy to choose sequences of actions that produce the greatest cumulative reward. Therefore the reinforcement learning is different from supervised learning in the sense that there is no presentation of input-output pairs as training examples. Furthermore, model-free reinforcement learning algorithms like Q-learning do not require defining or learning any models of the surrounding environment. Nevertheless ...
PDF

Performance Improvement of Genetic Programming Based on Reinforcement Learning (강화학습에 의한 유전자 프로그래밍의 성능 개선)

전효병;이동욱;심귀보
- Journal of the Korean Institute of Intelligent Systems
- /
- v.8 no.3
- /
- pp.1-8
- /
- 1998
This paper proposes a reinforcement genetic programming based on the reinforcement learning method for the performance improvement of genetic programming. Genetic programming which has tree structure program has much flexibility of problem expression because it has no limitation in the size of chromosome compared to the other evolutionary algorithms. But worse results on the point of convergence associated with mutation and crossover operations are often due to this characteristic. Therefore the sizes of population and maximum generation are typically larger than those of the other evolutionary algorithms. This paper proposes a new method that executes crossover and mutation operations based on reinforcement and inhibition mechanism of reinforcement learning. The validity of the proposed method is evaluated by appling it to the artificial ant problem.
PDF

Developing a new mutation operator to solve the RC deep beam problems by aid of genetic algorithm

Kaya, Mustafa
- Computers and Concrete
- /
- v.22 no.5
- /
- pp.493-500
- /
- 2018
Due to the fact that the ratio of their height to their openings is very large compared to normal beams, there are difficulties in the design and analysis of deep beams, which differ in behavior. In this study, the optimum horizontal and vertical reinforcement diameters of 5 different beams were determined by using genetic algorithms (GA) due to the openness/height ratio (L/h), loading condition and the presence of spaces in the body. In this study, the effect of different mutation operators and improved double times sensitive mutation (DTM) operator on GA's performance was investigated. In the study following random mutation (RM), boundary mutation (BM), non-uniform random mutation (NRM), Makinen, Periaux and Toivanen (MPT) mutation, power mutation (PM), polynomial mutation (PNM), and developed DTM mutation operators were applied to five deep beam problems were used to determine the minimum reinforcement diameter. The fitness values obtained using developed DTM mutation operator was higher than obtained from existing mutation operators. Moreover; obtained reinforcement weight of the deep beams using the developed DTM mutation operator lower than obtained from the existing mutation operators. As a result of the analyzes, the highest fitness value was obtained from the applied double times sensitive mutation (DTM) operator. In addition, it was found that this study, which was carried out using GAs, contributed to the solution of the problems experienced in the design of deep beams.
https://doi.org/10.12989/cac.2018.22.5.493 인용 KSCI

A Survey on Recent Advances in Multi-Agent Reinforcement Learning (멀티 에이전트 강화학습 기술 동향)

Yoo, B.H.;Ningombam, D.D.;Kim, H.W.;Song, H.J.;Park, G.M.;Yi, S.
- Electronics and Telecommunications Trends
- /
- v.35 no.6
- /
- pp.137-149
- /
- 2020
Several multi-agent reinforcement learning (MARL) algorithms have achieved overwhelming results in recent years. They have demonstrated their potential in solving complex problems in the field of real-time strategy online games, robotics, and autonomous vehicles. However these algorithms face many challenges when dealing with massive problem spaces in sparse reward environments. Based on the centralized training and decentralized execution (CTDE) architecture, the MARL algorithms discussed in the literature aim to solve the current challenges by formulating novel concepts of inter-agent modeling, credit assignment, multiagent communication, and the exploration-exploitation dilemma. The fundamental objective of this paper is to deliver a comprehensive survey of existing MARL algorithms based on the problem statements rather than on the technologies. We also discuss several experimental frameworks to provide insight into the use of these algorithms and to motivate some promising directions for future research.
https://doi.org/10.22648/ETRI.2020.J.350614 인용 PDF

Survey on Communication Algorithms for Multiagent Reinforcement Learning (멀티에이전트 강화학습을 위한 통신 기술 동향)

S.W. Seo;Y.H. Shin;B.H. Yoo;H.W. Kim;H.J. Song;S. Yi
- Electronics and Telecommunications Trends
- /
- v.38 no.4
- /
- pp.104-115
- /
- 2023
Communication for multiagent reinforcement learning (MARL) has emerged to promote understanding of an entire environment. Through communication for MARL, agents can cooperate by choosing the best action considering not only their surrounding environment but also the entire environment and other agents. Hence, MARL with communication may outperform conventional MARL. Many communication algorithms have been proposed to support MARL, but current analyses remain insufficient. This paper presents existing communication algorithms for MARL according to various criteria such as communication methods, contents, and restrictions. In addition, we consider several experimental environments that are primarily used to demonstrate the MARL performance enhanced by communication.
https://doi.org/10.22648/ETRI.2023.J.380410 인용 PDF

Search Result 149, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)