• Title/Summary/Keyword: Actor-critic

Search Result 47, Processing Time 0.028 seconds

A Study on Portfolio Asset Allocation Using Actor-Critic Model (Actor-Critic 모델을 이용한 포트폴리오 자산 배분에 관한 연구)

  • Kalina, Bayartsetseg;Lee, Ju-Hong;Song, Jae-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.439-441
    • /
    • 2020
  • 기존의 균등배분, 마코위츠, Recurrent Reinforcement Learning 방법들은 수익들을 최대화하거나 위험을 최소화하고, Risk Budgeting 방법은 각 자산에 목표 리스크를 배분하여 최적의 포트폴리오를 찾는다. 그러나 이 방법들은 미래의 최적화된 포트폴리오를 잘 찾아주지 못하는 문제점들이 있다. 본 논문은 자산 배분을 위한 Deterministic Policy Gradient 기반의 Actor Critic 모델을 개발하였고, 기존의 방법들보다 성능이 우수함을 검증한다.

Intelligent Warehousing: Comparing Cooperative MARL Strategies

  • Yosua Setyawan Soekamto;Dae-Ki Kang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.3
    • /
    • pp.205-211
    • /
    • 2024
  • Effective warehouse management requires advanced resource planning to optimize profits and space. Robots offer a promising solution, but their effectiveness relies on embedded artificial intelligence. Multi-agent reinforcement learning (MARL) enhances robot intelligence in these environments. This study explores various MARL algorithms using the Multi-Robot Warehouse Environment (RWARE) to determine their suitability for warehouse resource planning. Our findings show that cooperative MARL is essential for effective warehouse management. IA2C outperforms MAA2C and VDA2C on smaller maps, while VDA2C excels on larger maps. IA2C's decentralized approach, focusing on cooperation over collaboration, allows for higher reward collection in smaller environments. However, as map size increases, reward collection decreases due to the need for extensive exploration. This study highlights the importance of selecting the appropriate MARL algorithm based on the specific warehouse environment's requirements and scale.

Suspension Control using Reinforcement Learning (강화학습에 의한 현가장치의 제어)

  • Jeong, Gyu-Baek;Mun, Yeong-Jun;Park, Ju-Yeong
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.163-166
    • /
    • 2007
  • 최근에 국내외의 인공지능 분야에서는, 강화학습(reinforcement learning)에 관한 연구가 활발히 진행되고 있다. 본 논문에서는 능동형 현가장치(active-suspension)의 제어를 위하여 RLS 기반 NAC(natural actor-critic)을 활용한 강화학습 기법을 적용해보고, 그 성능을 시뮬레이션을 통해 확인해본다.

  • PDF

Tunnel Ventilation Controller Design Employing RLS-Based Natural Actor-Critic Algorithm (RLS 기반의 Natural Actor-Critic 알고리즘을 이용한 터널 환기제어기 설계)

  • Chu B.;Kim D.;Hong D.;Park J.;Chung J.T.;Kim T.H.
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.53-54
    • /
    • 2006
  • The main purpose of tunnel ventilation system is to maintain CO pollutant and VI (visibility index) under an adequate level to provide drivers with safe driving condition. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement teaming (RL) method. RL is a goal-directed teaming of a mapping from situations to actions. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. Constructing the reward of the tunnel ventilation system, two objectives listed above are included. RL algorithm based on actor-critic architecture and natural gradient method is adopted to the system. Also, the recursive least-squares (RLS) is employed to the learning process to improve the efficiency of the use of data. The simulation results performed with real data collected from existing tunnel are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  • PDF

Analysis of Reinforcement Learning Methods for BS Switching Operation (기지국 상태 조정을 위한 강화 학습 기법 분석)

  • Park, Hyebin;Lim, Yujin
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.2
    • /
    • pp.351-358
    • /
    • 2018
  • Reinforcement learning is a machine learning method which aims to determine a policy to get optimal actions in dynamic and stochastic environments. But reinforcement learning has high computational complexity and needs a lot of time to get solution, so it is not easily applicable to uncertain and continuous environments. To tackle the complexity problem, AC (actor-critic) method is used and it separates an action-value function into a value function and an action decision policy. Also, in transfer learning method, the knowledge constructed in one environment is adapted to another environment, so it reduces the time to learn in a reinforcement learning method. In this paper, we present AC method and transfer learning method to solve the problem of a reinforcement learning method. Finally, we analyze the case study which a transfer learning method is used to solve BS(base station) switching problem in wireless access networks.

Development of an Actor-Critic Deep Reinforcement Learning Platform for Robotic Grasping in Real World (현실 세계에서의 로봇 파지 작업을 위한 정책/가치 심층 강화학습 플랫폼 개발)

  • Kim, Taewon;Park, Yeseong;Kim, Jong Bok;Park, Youngbin;Suh, Il Hong
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.2
    • /
    • pp.197-204
    • /
    • 2020
  • In this paper, we present a learning platform for robotic grasping in real world, in which actor-critic deep reinforcement learning is employed to directly learn the grasping skill from raw image pixels and rarely observed rewards. This is a challenging task because existing algorithms based on deep reinforcement learning require an extensive number of training data or massive computational cost so that they cannot be affordable in real world settings. To address this problems, the proposed learning platform basically consists of two training phases; a learning phase in simulator and subsequent learning in real world. Here, main processing blocks in the platform are extraction of latent vector based on state representation learning and disentanglement of a raw image, generation of adapted synthetic image using generative adversarial networks, and object detection and arm segmentation for the disentanglement. We demonstrate the effectiveness of this approach in a real environment.

Robot locomotion via IRPO based Actor-Critic Learning Method (IRPO 기반 Actor-Critic 학습 기법을 이용한 로봇이동)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Proceedings of the KIEE Conference
    • /
    • 2005.07d
    • /
    • pp.2933-2935
    • /
    • 2005
  • The IRPO(Intensive Randomized Policy Optimizer) algorithm is a recently developed tool in the area of reinforcement leaming. And it has been shown to be very successful in several application problems. To compare with a general RL method, IRPO has some difference in that policy utilizes the entire history of agent -environment interaction. The policy is derived from the history directly, not through any kind of a model of the environment. In this paper, we consider a robot-control problem utilizing a IRPO algorithm. We also developed a MATLAH-based animation program, by which the effectiveness of the training algorithms were observed.

  • PDF

Capacitated Fab Scheduling Approximation using Average Reward TD(${\lambda}$) Learning based on System Feature Functions (시스템 특성함수 기반 평균보상 TD(${\lambda}$) 학습을 통한 유한용량 Fab 스케줄링 근사화)

  • Choi, Jin-Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.34 no.4
    • /
    • pp.189-196
    • /
    • 2011
  • In this paper, we propose a logical control-based actor-critic algorithm as an efficient approach for the approximation of the capacitated fab scheduling problem. We apply the average reward temporal-difference learning method for estimating the relative value functions of system states, while avoiding deadlock situation by Banker's algorithm. We consider the Intel mini-fab re-entrant line for the evaluation of the suggested algorithm and perform a numerical experiment by generating some sample system configurations randomly. We show that the suggested method has a prominent performance compared to other well-known heuristics.

Design of Rotary Inverted Pendulum System Using Distributed A3C Algorithm (분산 A3C를 활용한 회전식 도립 진자 시스템 설계)

  • Kwon, Do-Hyung;Lim, Hyun-Kyo;Kim, Ju-Bong;Han, Youn-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.493-495
    • /
    • 2019
  • 제어 분야의 가장 기초적인 시스템인 Rotary Inverted Pendulum 을 제어하기 위하여, 본 논문에서는 강화학습에서 Deep Q-Network 과 함께 대표적인 알고리즘으로 알려진 Asynchronous Advantage Actor-Critic 을 활용하여 다중 디바이스 제어를 설계한다. Deep Q-Network 알고리즘을 활용한 기존 연구와 동일한 방식으로 실 세계의 물리 에이전트와 가상 환경을 맵핑시키며, 스위치를 통하여 로컬 에이전트와 글로벌 네트워크 간 통신을 구성한다. 본 논문에서는 분산 Asynchronous Advantage Actor-Critic 을 이용함으로써 실 세계의 다중 에이전트 제어를 위한 강화 학습의 활용 가능성을 조명한다.

A Reinforcement learning-based for Multi-user Task Offloading and Resource Allocation in MEC

  • Xiang, Tiange;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.45-47
    • /
    • 2022
  • Mobile edge computing (MEC), which enables mobile terminals to offload computational tasks to a server located at the user's edge, is considered an effective way to reduce the heavy computational burden and achieve efficient computational offloading. In this paper, we study a multi-user MEC system in which multiple user devices (UEs) can offload computation to the MEC server via a wireless channel. To solve the resource allocation and task offloading problem, we take the total cost of latency and energy consumption of all UEs as our optimization objective. To minimize the total cost of the considered MEC system, we propose an DRL-based method to solve the resource allocation problem in wireless MEC. Specifically, we propose a Asynchronous Advantage Actor-Critic (A3C)-based scheme. Asynchronous Advantage Actor-Critic (A3C) is applied to this framework and compared with DQN, and Double Q-Learning simulation results show that this scheme significantly reduces the total cost compared to other resource allocation schemes