• Title/Summary/Keyword: 강화 학습

Search Result 1,608, Processing Time 0.034 seconds

Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle (수중운동체의 롤 제어를 위한 Deep Deterministic Policy Gradient 기반 강화학습)

  • Kim, Su Yong;Hwang, Yeon Geol;Moon, Sung Woong
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.5
    • /
    • pp.558-568
    • /
    • 2021
  • The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

The improvement of vocational training in juvenile protection agencies (소년원 직업능력개발훈련 개선방안 연구)

  • Byun, Sook young
    • Journal of vocational education research
    • /
    • v.33 no.3
    • /
    • pp.1-17
    • /
    • 2014
  • The purposes of this research are to analysis and diagnosis the actual condition of vocational ability development training for juveniles delinquent occupational capability development of juveniles who are assorted as a socially disadvantaged group and to suggest the remedies for their successful return to normal social life. In order to accomplish these purpose of the research, we involved in vocational training teachers(23) and trained for more a month to juvenile delinquents(533) such as surveys. Both of them focused on improvement of strengthen for vocational training, complementary learning of juvenile delinquents and securing experts.

A Study on Load Distribution of Gaming Server Using Proximal Policy Optimization (Proximal Policy Optimization을 이용한 게임서버의 부하분산에 관한 연구)

  • Park, Jung-min;Kim, Hye-young;Cho, Sung Hyun
    • Journal of Korea Game Society
    • /
    • v.19 no.3
    • /
    • pp.5-14
    • /
    • 2019
  • The gaming server is based on a distributed server. In order to distribute workloads of gaming servers, distributed gaming servers apply some algorithms which divide each of gaming server's workload into balanced workload among the gaming servers and as a result, efficiently manage response time and fusibility of server requested by the clients. In this paper, we propose a load balancing agent using PPO(Proximal Policy Optimization) which is one of the methods from a greedy algorithm and Policy Gradient which is from Reinforcement Learning. The proposed load balancing agent is compared with the previous researches based on the simulation.

Implementation of End-to-End Training of Deep Visuomotor Policies for Manipulation of a Robotic Arm of Baxter Research Robot (백스터 로봇의 시각기반 로봇 팔 조작 딥러닝을 위한 강화학습 알고리즘 구현)

  • Kim, Seongun;Kim, Sol A;de Lima, Rafael;Choi, Jaesik
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.1
    • /
    • pp.40-49
    • /
    • 2019
  • Reinforcement learning has been applied to various problems in robotics. However, it was still hard to train complex robotic manipulation tasks since there is a few models which can be applicable to general tasks. Such general models require a lot of training episodes. In these reasons, deep neural networks which have shown to be good function approximators have not been actively used for robot manipulation task. Recently, some of these challenges are solved by a set of methods, such as Guided Policy Search, which guide or limit search directions while training of a deep neural network based policy model. These frameworks are already applied to a humanoid robot, PR2. However, in robotics, it is not trivial to adjust existing algorithms designed for one robot to another robot. In this paper, we present our implementation of Guided Policy Search to the robotic arms of the Baxter Research Robot. To meet the goals and needs of the project, we build on an existing implementation of Baxter Agent class for the Guided Policy Search algorithm code using the built-in Python interface. This work is expected to play an important role in popularizing robot manipulation reinforcement learning methods on cost-effective robot platforms.

Smart AGV based on Object Recognition and Task Scheduling (객체인식과 작업 스케줄링 기반 스마트 AGV)

  • Lee, Se-Hoon;Bak, Tae-Yeong;Choi, Kyu-Hyun;So, Won-Bin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.251-252
    • /
    • 2019
  • 본 논문에서는 기존의 AGV보다 높은 안전성과 Task Scheduling을 바탕으로 한 효율적인 AGV를 제안하였다. AGV는 객체인식 알고리즘인 YOLO로 다른 AGV를 인식하여 자동으로 피난처로 들어간다. 또한 마커인식 알고리즘인 ar_markers를 이용하여 그 위치가 적재소인지 생산 공정인지를 판단하여 각 마커마다 멈추고 피난처에 해당하는 Marker가 인식되고 다른 AGV가 인식되면 피난처로 들어가는 동작을 한다. 이 모든 로그는 Mobius를 이용해 Spring기반의 웹 홈페이지로 확인할 수 있으며, 작업스케줄 명령 또한 웹 홈페이지에서 내리게 된다. 위 작업스케줄은 외판원, 벨만-포드 알고리즘을 적용한 뒤 강화학습알고리즘 중 하나인 DQN을 이용해 최적 값을 도출해 내고 그 값을 DB에 저장해 AGV가 움직일 수 있도록 한다. 본 논문에서는 YOLO와 Marker 그리고 웹을 사용하는 AGV가 기존의 AGV에 비해 더욱 가볍고 큰 시설이 필요하지 않다는 점에서 우수함을 보인다.

  • PDF

Design and Implementation of an OpenCV-based Digital Doorlock (OpenCV기반 디지털 도어락 시스템의 설계 및 구현)

  • Park, Sang-Young;Kang, Hwa-Young;Lee, Kang-Hee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.321-324
    • /
    • 2019
  • 최근 국내에는 실업률 상승, 혼인률 하락 등 청년층 생애주기 변화, 단독거주, 고령층의 증가에 따라 1인 가구가 빠른 속도로 증가하고 있다. 이러한 추세는 지속될 것으로 예상되어 1인 가구를 겨냥한 맞춤형 보안솔루션에 대한 관심이 고조되고 있다. 본 논문에서는 사물 인터넷 기술을 적극적으로 접목할 수 있을 것으로 기대되는 디지털 도어락의 구현에 관한 연구를 수행하였다. 사물 인터넷 기술은 5G 시대의 도래에 따라 다시금 주목받고 있다. 이는 4차 산업혁명 시대의 핵심 기반 기술로 주요 IT 기업들이 상용화 기술 확보를 추진하고 있는 상황이다. 한편 디지털 도어락은 열쇠가 필요하지 않으며 위급상황이나 안전상황에 클릭 한번으로 출동 요원의 출동을 곧바로 요청할 수 있어 고객에게 편의성과 보안성을 제공한다. 하지만 비밀번호 방식의 디지털 도어락은 주기적으로 비밀번호를 교체해주지 않는 이상 지속적으로 같은 자리의 버튼만을 누르게 된다. 이렇게 되면 해당 위치에 지문이 남아서 비밀번호가 노출될 위험이 있다. 그러나 사물 인터넷 기술을 이용한 디지털 도어락을 사용하게 된다면 안전한 도어락 사용으로 주거 보안을 실현할 수 있다. 따라서 1인 가구를 노리는 범죄를 예방하기 위해 라즈베리 파이와 아두이노의 UART 통신, 머신러닝 CV를 이용하여 얼굴 인식으로 동일인임을 판단하는 디지털 도어락을 구현했다.

  • PDF

A Study on Development of Cultural Assets Map Using AR Multi-Marker Recognition Technology (AR다중마커 인식 기술을 활용한 문화재 지도 개발 연구)

  • Kim, Mi-ri;Song, Eun-jee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.87-89
    • /
    • 2019
  • The existing curriculum of cultural assets compulsory curriculum is difficult to expect educational effect due to lack of interests and visibility. The purpose of this study is to develop application contents and teaching materials using cultural data 3 - D data based on multi - marker recognition technology of augmented reality, which is attracting attention in the era of the 4th industrial revolution. In the case of the contents of a multi-marker technology capable of recognizing a plurality of markers, various events can be added and output capability can be enhanced. The proposed augmented reality app will be applied to the study of various education contents production technology utilizing cultural properties.

  • PDF

Designing an Efficient Reward Function for Robot Reinforcement Learning of The Water Bottle Flipping Task (보틀플리핑의 로봇 강화학습을 위한 효과적인 보상 함수의 설계)

  • Yang, Young-Ha;Lee, Sang-Hyeok;Lee, Cheol-Soo
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.2
    • /
    • pp.81-86
    • /
    • 2019
  • Robots are used in various industrial sites, but traditional methods of operating a robot are limited at some kind of tasks. In order for a robot to accomplish a task, it is needed to find and solve accurate formula between a robot and environment and that is complicated work. Accordingly, reinforcement learning of robots is actively studied to overcome this difficulties. This study describes the process and results of learning and solving which applied reinforcement learning. The mission that the robot is going to learn is bottle flipping. Bottle flipping is an activity that involves throwing a plastic bottle in an attempt to land it upright on its bottom. Complexity of movement of liquid in the bottle when it thrown in the air, makes this task difficult to solve in traditional ways. Reinforcement learning process makes it easier. After 3-DOF robotic arm being instructed how to throwing the bottle, the robot find the better motion that make successful with the task. Two reward functions are designed and compared the result of learning. Finite difference method is used to obtain policy gradient. This paper focuses on the process of designing an efficient reward function to improve bottle flipping motion.

Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD (스마트 TMD 제어를 위한 강화학습 알고리즘 성능 검토)

  • Kang, Joo-Won;Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.2
    • /
    • pp.41-48
    • /
    • 2021
  • A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.

Development of a Real-time Safest Evacuation Route using Internet of Things and Reinforcement Learning in Case of Fire in a Building (건물 내 화재 발생 시 사물 인터넷과 강화 학습을 활용한 실시간 안전 대피 경로 방안 개발)

  • Ahn, Yusun;Choi, Haneul
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.2
    • /
    • pp.97-105
    • /
    • 2022
  • Human casualties from fires are increasing worldwide. The majority of human deaths occur during the evacuation process, as occupants panic and are unaware of the location of the fire and evacuation routes. Using an Internet of Things (IoT) sensor and reinforcement learning, we propose a method to find the safest evacuation route by considering the fire location, flame speed, occupant position, and walking conditions. The first step is detecting the fire with IoT-based devices. The second step is identifying the occupant's position via a beacon connected to the occupant's mobile phone. In the third step, the collected information, flame speed, and walking conditions are input into the reinforcement learning model to derive the optimal evacuation route. This study makes it possible to provide the safest evacuation route for individual occupants in real time. This study is expected to reduce human casualties caused by fires.