• 제목/요약/키워드: reinforcement algorithms

검색결과 148건 처리시간 0.022초

백스터 로봇의 시각기반 로봇 팔 조작 딥러닝을 위한 강화학습 알고리즘 구현 (Implementation of End-to-End Training of Deep Visuomotor Policies for Manipulation of a Robotic Arm of Baxter Research Robot)

  • 김성운;김솔아;하파엘 리마;최재식
    • 로봇학회논문지
    • /
    • 제14권1호
    • /
    • pp.40-49
    • /
    • 2019
  • Reinforcement learning has been applied to various problems in robotics. However, it was still hard to train complex robotic manipulation tasks since there is a few models which can be applicable to general tasks. Such general models require a lot of training episodes. In these reasons, deep neural networks which have shown to be good function approximators have not been actively used for robot manipulation task. Recently, some of these challenges are solved by a set of methods, such as Guided Policy Search, which guide or limit search directions while training of a deep neural network based policy model. These frameworks are already applied to a humanoid robot, PR2. However, in robotics, it is not trivial to adjust existing algorithms designed for one robot to another robot. In this paper, we present our implementation of Guided Policy Search to the robotic arms of the Baxter Research Robot. To meet the goals and needs of the project, we build on an existing implementation of Baxter Agent class for the Guided Policy Search algorithm code using the built-in Python interface. This work is expected to play an important role in popularizing robot manipulation reinforcement learning methods on cost-effective robot platforms.

Multicast Tree Generation using Meta Reinforcement Learning in SDN-based Smart Network Platforms

  • Chae, Jihun;Kim, Namgi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권9호
    • /
    • pp.3138-3150
    • /
    • 2021
  • Multimedia services on the Internet are continuously increasing. Accordingly, the demand for a technology for efficiently delivering multimedia traffic is also constantly increasing. The multicast technique, that delivers the same content to several destinations, is constantly being developed. This technique delivers a content from a source to all destinations through the multicast tree. The multicast tree with low cost increases the utilization of network resources. However, the finding of the optimal multicast tree that has the minimum link costs is very difficult and its calculation complexity is the same as the complexity of the Steiner tree calculation which is NP-complete. Therefore, we need an effective way to obtain a multicast tree with low cost and less calculation time on SDN-based smart network platforms. In this paper, we propose a new multicast tree generation algorithm which produces a multicast tree using an agent trained by model-based meta reinforcement learning. Experiments verified that the proposed algorithm generated multicast trees in less time compared with existing approximation algorithms. It produced multicast trees with low cost in a dynamic network environment compared with the previous DQN-based algorithm.

A hidden anti-jamming method based on deep reinforcement learning

  • Wang, Yifan;Liu, Xin;Wang, Mei;Yu, Yu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권9호
    • /
    • pp.3444-3457
    • /
    • 2021
  • In the field of anti-jamming based on dynamic spectrum, most methods try to improve the ability to avoid jamming and seldom consider whether the jammer would perceive the user's signal. Although these existing methods work in some anti-jamming scenarios, their long-term performance may be depressed when intelligent jammers can learn user's waveform or decision information from user's historical activities. Hence, we proposed a hidden anti-jamming method to address this problem by reducing the jammer's sense probability. In the proposed method, the action correlation between the user and the jammer is used to evaluate the hiding effect of the user's actions. And a deep reinforcement learning framework, including specific action correlation calculation and iteration learning algorithm, is designed to maximize the hiding and communication performance of the user synchronously. The simulation result shows that the algorithm proposed reduces the jammer's sense probability significantly and improves the user's anti-jamming performance slightly compared to the existing algorithms based on jamming avoidance.

지도학습과 강화학습을 이용한 준능동 중간층면진시스템의 최적설계 (Optimal Design of Semi-Active Mid-Story Isolation System using Supervised Learning and Reinforcement Learning)

  • 강주원;김현수
    • 한국공간구조학회논문집
    • /
    • 제21권4호
    • /
    • pp.73-80
    • /
    • 2021
  • A mid-story isolation system was proposed for seismic response reduction of high-rise buildings and presented good control performance. Control performance of a mid-story isolation system was enhanced by introducing semi-active control devices into isolation systems. Seismic response reduction capacity of a semi-active mid-story isolation system mainly depends on effect of control algorithm. AI(Artificial Intelligence)-based control algorithm was developed for control of a semi-active mid-story isolation system in this study. For this research, an practical structure of Shiodome Sumitomo building in Japan which has a mid-story isolation system was used as an example structure. An MR (magnetorheological) damper was used to make a semi-active mid-story isolation system in example model. In numerical simulation, seismic response prediction model was generated by one of supervised learning model, i.e. an RNN (Recurrent Neural Network). Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm The numerical simulation results presented that the DQN algorithm can effectively control a semi-active mid-story isolation system resulting in successful reduction of seismic responses.

The Development of an Intelligent Home Energy Management System Integrated with a Vehicle-to-Home Unit using a Reinforcement Learning Approach

  • Ohoud Almughram;Sami Ben Slama;Bassam Zafar
    • International Journal of Computer Science & Network Security
    • /
    • 제24권4호
    • /
    • pp.87-106
    • /
    • 2024
  • Vehicle-to-Home (V2H) and Home Centralized Photovoltaic (HCPV) systems can address various energy storage issues and enhance demand response programs. Renewable energy, such as solar energy and wind turbines, address the energy gap. However, no energy management system is currently available to regulate the uncertainty of renewable energy sources, electric vehicles, and appliance consumption within a smart microgrid. Therefore, this study investigated the impact of solar photovoltaic (PV) panels, electric vehicles, and Micro-Grid (MG) storage on maximum solar radiation hours. Several Deep Learning (DL) algorithms were applied to account for the uncertainty. Moreover, a Reinforcement Learning HCPV (RL-HCPV) algorithm was created for efficient real-time energy scheduling decisions. The proposed algorithm managed the energy demand between PV solar energy generation and vehicle energy storage. RL-HCPV was modeled according to several constraints to meet household electricity demands in sunny and cloudy weather. Simulations demonstrated how the proposed RL-HCPV system could efficiently handle the demand response and how V2H can help to smooth the appliance load profile and reduce power consumption costs with sustainable power generation. The results demonstrated the advantages of utilizing RL and V2H as potential storage technology for smart buildings.

웹의 연결구조로부터 Hub와 Authority를 효과적으로 도출하기 위한 상호강화모델의 확장 (An Extended Mutual Reinforcement Model for Finding Hubs and Authorities from Link Structures on the WWW)

  • 황인수
    • 한국경영과학회지
    • /
    • 제30권2호
    • /
    • pp.1-11
    • /
    • 2005
  • The network structures of a hyperlinked environment can be a rich source of information about the contents of the environment and it provides effective means for understanding it. Recently, there have been a number of algorithms proposed analyzing hypertext link structure so as to determine the best authorities for a given topic or query. In this paper, we review the algorithm of mutual reinforcement relationship for finding hubs and authorities from World Wide Web, and suggest SHA, a new approach for link-structure analysis, which uses the relationships among a set of relative authoritative pages, a set of hub pages, and a set of super hub pages.

Improved numerical approach for the bond-slip behavior under cyclic loads

  • Kwak, H.G.
    • Structural Engineering and Mechanics
    • /
    • 제5권5호
    • /
    • pp.663-677
    • /
    • 1997
  • Bond-slip behavior between reinforcement and concrete under push-pull cyclic loadings is numerically investigated based on a reinforcement model proposed in this paper. The equivalent reinforcing steel model considering the bond-slip effect without taking double nodes is derived through the equilibrium at each node of steel and the compatibility condition between steel and concrete. Besides a specific transformation algorithm is composed to transfer the forces and displacements from the nodes of the steel element to the nodes of the concrete element. This model first results in an effective use in the case of complex steel arrangements where the steel elements cross the sides of the concrete elements and second turns the impossibility into a possibility in consideration of the bond-slip effect in three dimensional finite element analysis. Finally, the correlation studies between numerical and experimental results under the continuously repeated large deformation stages demonstrate the validity of developed reinforcing steel model and adopted algorithms.

Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions

  • Lee, Jong-Min;Lee, Jay H.
    • International Journal of Control, Automation, and Systems
    • /
    • 제2권3호
    • /
    • pp.263-278
    • /
    • 2004
  • This paper reviews dynamic programming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-Dynamic Programming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques fur dynamic programming and identify those most suitable for process control problems. Several open issues are also identified and discussed.

RPO 기반 강화학습 알고리즘을 이용한 로봇 제어 (Robot Control via RPO-based Reinforcement Learning Algorithm)

  • 김종호;강대성;박주영
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2005년도 춘계학술대회 학술발표 논문집 제15권 제1호
    • /
    • pp.217-220
    • /
    • 2005
  • The RPO algorithm is a recently developed tool in the area of reinforcement Loaming, And it has been shown In be very successful in several application problems. In this paper, we consider a robot-control problem utilizing a modified RPO algorithm in which its critic network is adapted via RLS(Recursive Least Square) algorithm. We also developed a MATLAB-based animation program, by which the effectiveness of the training algorithms were observed.

  • PDF

Fuzzy Inference-based Reinforcement Learning of Dynamic Recurrent Neural Networks

  • Jun, Hyo-Byung;Sim, Kwee-Bo
    • 한국지능시스템학회논문지
    • /
    • 제7권5호
    • /
    • pp.60-66
    • /
    • 1997
  • This paper presents a fuzzy inference-based reinforcement learning algorithm of dynamci recurrent neural networks, which is very similar to the psychological learning method of higher animals. By useing the fuzzy inference technique the linguistic and concetional expressions have an effect on the controller's action indirectly, which is shown in human's behavior. The intervlas of fuzzy membership functions are found optimally by genetic algorithms. And using recurrent neural networks composed of dynamic neurons as action-generation networks, past state as well as current state is considered to make an action in dynamical environment. We show the validity of the proposed learning algorithm by applying it to the inverted pendulum control problem.

  • PDF