• Title/Summary/Keyword: reinforcement algorithms

Search Result 148, Processing Time 0.027 seconds

Control of Crawling Robot using Actor-Critic Fuzzy Reinforcement Learning (액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어)

  • Moon, Young-Joon;Lee, Jae-Hoon;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.519-524
    • /
    • 2009
  • Recently, reinforcement learning methods have drawn much interests in the area of machine learning. Dominant approaches in researches for the reinforcement learning include the value-function approach, the policy search approach, and the actor-critic approach, among which pertinent to this paper are algorithms studied for problems with continuous states and continuous actions along the line of the actor-critic strategy. In particular, this paper focuses on presenting a method combining the so-called ACFRL(actor-critic fuzzy reinforcement learning), which is an actor-critic type reinforcement learning based on fuzzy theory, together with the RLS-NAC which is based on the RLS filters and natural actor-critic methods. The presented method is applied to a control problem for crawling robots, and some results are reported from comparison of learning performance.

Pipe Routing using Reinforcement Learning on Initial Design Stage (선박 초기 설계 단계에서의 강화학습을 이용한 배관경로 설계)

  • Shin, Dong-seon;Park, Byeong-cheol;Lim, Chae-og;Oh, Sang-jin;Kim, Gi-yong;Shin, Sung-chul
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.57 no.4
    • /
    • pp.191-197
    • /
    • 2020
  • Pipe routing is a important part of the whole design process in the shipbuilding industry. It has a lot of constraints and many tasks that should be considered together. Also, the result of this stage affects follow-up works in a wide scope. Therefore, this part requires skilled designers and a lot of time. This study aims to reduce the workload and time during the design process by automating the pipe route design on initial stage. In this study, the reinforcement learning was used for pipe auto-routing. Reinforcement learning has the advantage of dynamically selecting routes, unlike existing algorithms. Therefore, it is suitable for the pipe routing design in ship design process which is frequently modified. At last, the effectiveness of this study was verified by comparing pipelines which were designed by piping designer and reinforcement learning results.

Q-Learning Policy and Reward Design for Efficient Path Selection (효율적인 경로 선택을 위한 Q-Learning 정책 및 보상 설계)

  • Yong, Sung-Jung;Park, Hyo-Gyeong;You, Yeon-Hwi;Moon, Il-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.2
    • /
    • pp.72-77
    • /
    • 2022
  • Among the techniques of reinforcement learning, Q-Learning means learning optimal policies by learning Q functions that perform actionsin a given state and predict future efficient expectations. Q-Learning is widely used as a basic algorithm for reinforcement learning. In this paper, we studied the effectiveness of selecting and learning efficient paths by designing policies and rewards based on Q-Learning. In addition, the results of the existing algorithm and punishment compensation policy and the proposed punishment reinforcement policy were compared by applying the same number of times of learning to the 8x8 grid environment of the Frozen Lake game. Through this comparison, it was analyzed that the Q-Learning punishment reinforcement policy proposed in this paper can significantly increase the learning speed compared to the application of conventional algorithms.

Enhancing Smart Grid Efficiency through SAC Reinforcement Learning: Renewable Energy Integration and Optimal Demand Response in the CityLearn Environment (SAC 강화 학습을 통한 스마트 그리드 효율성 향상: CityLearn 환경에서 재생 에너지 통합 및 최적 수요 반응)

  • Esanov Alibek Rustamovich;Seung Je Seong;Chang-Gyoon Lim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.93-104
    • /
    • 2024
  • Demand response is a strategy that encourages customers to adjust their consumption patterns at times of peak demand with the aim to improve the reliability of the power grid and minimize expenses. The integration of renewable energy sources into smart grids poses significant challenges due to their intermittent and unpredictable nature. Demand response strategies, coupled with reinforcement learning techniques, have emerged as promising approaches to address these challenges and optimize grid operations where traditional methods fail to meet such kind of complex requirements. This research focuses on investigating the application of reinforcement learning algorithms in demand response for renewable energy integration. The objectives include optimizing demand-side flexibility, improving renewable energy utilization, and enhancing grid stability. The results emphasize the effectiveness of demand response strategies based on reinforcement learning in enhancing grid flexibility and facilitating the integration of renewable energy.

Behavior Learning and Evolution of Swarm Robot System using Support Vector Machine (SVM을 이용한 군집로봇의 행동학습 및 진화)

  • Seo, Sang-Wook;Yang, Hyun-Chang;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.5
    • /
    • pp.712-717
    • /
    • 2008
  • In swarm robot systems, each robot must act by itself according to the its states and environments, and if necessary, must cooperate with other robots in order to carry out a given task. Therefore it is essential that each robot has both learning and evolution ability to adapt the dynamic environments. In this paper, reinforcement learning method with SVM based on structural risk minimization and distributed genetic algorithms is proposed for behavior learning and evolution of collective autonomous mobile robots. By distributed genetic algorithm exchanging the chromosome acquired under different environments by communication each robot can improve its behavior ability. Specially, in order to improve the performance of evolution, selective crossover using the characteristic of reinforcement learning that basis of SVM is adopted in this paper.

Co-Operative Strategy for an Interactive Robot Soccer System by Reinforcement Learning Method

  • Kim, Hyoung-Rock;Hwang, Jung-Hoon;Kwon, Dong-Soo
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.2
    • /
    • pp.236-242
    • /
    • 2003
  • This paper presents a cooperation strategy between a human operator and autonomous robots for an interactive robot soccer game, The interactive robot soccer game has been developed to allow humans to join into the game dynamically and reinforce entertainment characteristics. In order to make these games more interesting, a cooperation strategy between humans and autonomous robots on a team is very important. Strategies can be pre-programmed or learned by robots themselves with learning or evolving algorithms. Since the robot soccer system is hard to model and its environment changes dynamically, it is very difficult to pre-program cooperation strategies between robot agents. Q-learning - one of the most representative reinforcement learning methods - is shown to be effective for solving problems dynamically without explicit knowledge of the system. Therefore, in our research, a Q-learning based learning method has been utilized. Prior to utilizing Q-teaming, state variables describing the game situation and actions' sets of robots have been defined. After the learning process, the human operator could play the game more easily. To evaluate the usefulness of the proposed strategy, some simulations and games have been carried out.

Robot Control via RPO-based Reinforcement Learning Algorithm (RPO 기반 강화학습 알고리즘을 이용한 로봇제어)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.505-510
    • /
    • 2005
  • The RPO(randomized policy optimizer) algorithm, which utilizes probabilistic policy for the action selection, is a recently developed tool in the area of reinforcement learning, and has been shown to be very successful in several application problems. In this paper, we propose a modified RPO algorithm, whose critic network is adapted via RLS(Recursive Least Square) algorithm. In order to illustrate the applicability of the modified RPO method, we applied the modified algorithm to Kimura's robot and observed very good performance. We also developed a MATLAB-based animation program, by which the effectiveness of the training algorithms on the acceleration or the robot movement were observed.

Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning (심층 강화학습을 이용한 시변 비례 항법 유도 기법)

  • Chae, Hyeok-Joo;Lee, Daniel;Park, Su-Jeong;Choi, Han-Lim;Park, Han-Sol;An, Kyeong-Soo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.23 no.4
    • /
    • pp.399-406
    • /
    • 2020
  • In this paper, we propose a time-varying proportional navigation guidance law that determines the proportional navigation gain in real-time according to the operating situation. When intercepting a target, an unidentified evasion strategy causes a loss of optimality. To compensate for this problem, proper proportional navigation gain is derived at every time step by solving an optimal control problem with the inferred evader's strategy. Recently, deep reinforcement learning algorithms are introduced to deal with complex optimal control problem efficiently. We adapt the actor-critic method to build a proportional navigation gain network and the network is trained by the Proximal Policy Optimization(PPO) algorithm to learn an evasion strategy of the target. Numerical experiments show the effectiveness and optimality of the proposed method.

A Study on Application of Reinforcement Learning Algorithm Using Pixel Data (픽셀 데이터를 이용한 강화 학습 알고리즘 적용에 관한 연구)

  • Moon, Saemaro;Choi, Yonglak
    • Journal of Information Technology Services
    • /
    • v.15 no.4
    • /
    • pp.85-95
    • /
    • 2016
  • Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data.

A Study on Load Distribution of Gaming Server Using Proximal Policy Optimization (Proximal Policy Optimization을 이용한 게임서버의 부하분산에 관한 연구)

  • Park, Jung-min;Kim, Hye-young;Cho, Sung Hyun
    • Journal of Korea Game Society
    • /
    • v.19 no.3
    • /
    • pp.5-14
    • /
    • 2019
  • The gaming server is based on a distributed server. In order to distribute workloads of gaming servers, distributed gaming servers apply some algorithms which divide each of gaming server's workload into balanced workload among the gaming servers and as a result, efficiently manage response time and fusibility of server requested by the clients. In this paper, we propose a load balancing agent using PPO(Proximal Policy Optimization) which is one of the methods from a greedy algorithm and Policy Gradient which is from Reinforcement Learning. The proposed load balancing agent is compared with the previous researches based on the simulation.