• Title/Summary/Keyword: Policy Gradient

Search Result 73, Processing Time 0.031 seconds

A Study on Application of Reinforcement Learning Algorithm Using Pixel Data (픽셀 데이터를 이용한 강화 학습 알고리즘 적용에 관한 연구)

  • Moon, Saemaro;Choi, Yonglak
    • Journal of Information Technology Services
    • /
    • v.15 no.4
    • /
    • pp.85-95
    • /
    • 2016
  • Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data.

Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD (스마트 TMD 제어를 위한 강화학습 알고리즘 성능 검토)

  • Kang, Joo-Won;Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.2
    • /
    • pp.41-48
    • /
    • 2021
  • A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.

A Study on the Portfolio Performance Evaluation using Actor-Critic Reinforcement Learning Algorithms (액터-크리틱 모형기반 포트폴리오 연구)

  • Lee, Woo Sik
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.3
    • /
    • pp.467-476
    • /
    • 2022
  • The Bank of Korea raised the benchmark interest rate by a quarter percentage point to 1.75 percent per year, and analysts predict that South Korea's policy rate will reach 2.00 percent by the end of calendar year 2022. Furthermore, because market volatility has been significantly increased by a variety of factors, including rising rates, inflation, and market volatility, many investors have struggled to meet their financial objectives or deliver returns. Banks and financial institutions are attempting to provide Robo-Advisors to manage client portfolios without human intervention in this situation. In this regard, determining the best hyper-parameter combination is becoming increasingly important. This study compares some activation functions of the Deep Deterministic Policy Gradient(DDPG) and Twin-delayed Deep Deterministic Policy Gradient (TD3) Algorithms to choose a sequence of actions that maximizes long-term reward. The DDPG and TD3 outperformed its benchmark index, according to the results. One reason for this is that we need to understand the action probabilities in order to choose an action and receive a reward, which we then compare to the state value to determine an advantage. As interest in machine learning has grown and research into deep reinforcement learning has become more active, finding an optimal hyper-parameter combination for DDPG and TD3 has become increasingly important.

Task offloading scheme based on the DRL of Connected Home using MEC (MEC를 활용한 커넥티드 홈의 DRL 기반 태스크 오프로딩 기법)

  • Ducsun Lim;Kyu-Seek Sohn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.6
    • /
    • pp.61-67
    • /
    • 2023
  • The rise of 5G and the proliferation of smart devices have underscored the significance of multi-access edge computing (MEC). Amidst this trend, interest in effectively processing computation-intensive and latency-sensitive applications has increased. This study investigated a novel task offloading strategy considering the probabilistic MEC environment to address these challenges. Initially, we considered the frequency of dynamic task requests and the unstable conditions of wireless channels to propose a method for minimizing vehicle power consumption and latency. Subsequently, our research delved into a deep reinforcement learning (DRL) based offloading technique, offering a way to achieve equilibrium between local computation and offloading transmission power. We analyzed the power consumption and queuing latency of vehicles using the deep deterministic policy gradient (DDPG) and deep Q-network (DQN) techniques. Finally, we derived and validated the optimal performance enhancement strategy in a vehicle based MEC environment.

Evaluation of Human Demonstration Augmented Deep Reinforcement Learning Policies via Object Manipulation with an Anthropomorphic Robot Hand (휴먼형 로봇 손의 사물 조작 수행을 이용한 사람 데모 결합 강화학습 정책 성능 평가)

  • Park, Na Hyeon;Oh, Ji Heon;Ryu, Ga Hyun;Lopez, Patricio Rivera;Anazco, Edwin Valarezo;Kim, Tae Seong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.5
    • /
    • pp.179-186
    • /
    • 2021
  • Manipulation of complex objects with an anthropomorphic robot hand like a human hand is a challenge in the human-centric environment. In order to train the anthropomorphic robot hand which has a high degree of freedom (DoF), human demonstration augmented deep reinforcement learning policy optimization methods have been proposed. In this work, we first demonstrate augmentation of human demonstration in deep reinforcement learning (DRL) is effective for object manipulation by comparing the performance of the augmentation-free Natural Policy Gradient (NPG) and Demonstration Augmented NPG (DA-NPG). Then three DRL policy optimization methods, namely NPG, Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO), have been evaluated with DA (i.e., DA-NPG, DA-TRPO, and DA-PPO) and without DA by manipulating six objects such as apple, banana, bottle, light bulb, camera, and hammer. The results show that DA-NPG achieved the average success rate of 99.33% whereas NPG only achieved 60%. In addition, DA-NPG succeeded grasping all six objects while DA-TRPO and DA-PPO failed to grasp some objects and showed unstable performances.

Research on the application of Machine Learning to threat assessment of combat systems

  • Seung-Joon Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.47-55
    • /
    • 2023
  • This paper presents a method for predicting the threat index of combat systems using Gradient Boosting Regressors and Support Vector Regressors among machine learning models. Currently, combat systems are software that emphasizes safety and reliability, so the application of AI technology that is not guaranteed to be reliable is restricted by policy, and as a result, the electrified domestic combat systems are not equipped with AI technology. However, in order to respond to the policy direction of the Ministry of National Defense, which aims to electrify AI, we conducted a study to secure the basic technology required for the application of machine learning in combat systems. After collecting the data required for threat index evaluation, the study determined the prediction accuracy of the trained model by processing and refining the data, selecting the machine learning model, and selecting the optimal hyper-parameters. As a result, the model score for the test data was over 99 points, confirming the applicability of machine learning models to combat systems.

Market Timing and Seasoned Equity Offering (마켓 타이밍과 유상증자)

  • Sung Won Seo
    • Asia-Pacific Journal of Business
    • /
    • v.15 no.1
    • /
    • pp.145-157
    • /
    • 2024
  • Purpose - In this study, we propose an empirical model for predicting seasoned equity offering (SEO here after) using machine learning methods. Design/methodology/approach - The models utilize the random forest method based on decision trees that considers non-linear relationships, as well as the gradient boosting tree model. SEOs incur significant direct and indirect costs. Therefore, CEOs' decisions of seasoned equity issuances are made only when the benefits outweigh the costs, which leads to a non-linear relationship between SEOs and a determinant of them. Particularly, a variable related to market timing effectively exhibit such non-linear relations. Findings - To account for these non-linear relationships, we hypothesize that decision tree-based random forest and gradient boosting tree models are more suitable than the linear methodologies due to the non-linear relations. The results of this study support this hypothesis. Research implications or Originality - We expect that our findings can provide meaningful information to investors and policy makers by classifying companies to undergo SEOs.

Singularity Avoidance Path Planning on Cooperative Task of Dual Manipulator Using DDPG Algorithm (DDPG 알고리즘을 이용한 양팔 매니퓰레이터의 협동작업 경로상의 특이점 회피 경로 계획)

  • Lee, Jonghak;Kim, Kyeongsoo;Kim, Yunjae;Lee, Jangmyung
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.2
    • /
    • pp.137-146
    • /
    • 2021
  • When controlling manipulator, degree of freedom is lost in singularity so specific joint velocity does not propagate to the end effector. In addition, control problem occurs because jacobian inverse matrix can not be calculated. To avoid singularity, we apply Deep Deterministic Policy Gradient(DDPG), algorithm of reinforcement learning that rewards behavior according to actions then determines high-reward actions in simulation. DDPG uses off-policy that uses 𝝐-greedy policy for selecting action of current time step and greed policy for the next step. In the simulation, learning is given by negative reward when moving near singulairty, and positive reward when moving away from the singularity and moving to target point. The reward equation consists of distance to target point and singularity, manipulability, and arrival flag. Dual arm manipulators hold long rod at the same time and conduct experiments to avoid singularity by simulated path. In the learning process, if object to be avoided is set as a space rather than point, it is expected that avoidance of obstacles will be possible in future research.

Cloud Task Scheduling Based on Proximal Policy Optimization Algorithm for Lowering Energy Consumption of Data Center

  • Yang, Yongquan;He, Cuihua;Yin, Bo;Wei, Zhiqiang;Hong, Bowei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1877-1891
    • /
    • 2022
  • As a part of cloud computing technology, algorithms for cloud task scheduling place an important influence on the area of cloud computing in data centers. In our earlier work, we proposed DeepEnergyJS, which was designed based on the original version of the policy gradient and reinforcement learning algorithm. We verified its effectiveness through simulation experiments. In this study, we used the Proximal Policy Optimization (PPO) algorithm to update DeepEnergyJS to DeepEnergyJSV2.0. First, we verify the convergence of the PPO algorithm on the dataset of Alibaba Cluster Data V2018. Then we contrast it with reinforcement learning algorithm in terms of convergence rate, converged value, and stability. The results indicate that PPO performed better in training and test data sets compared with reinforcement learning algorithm, as well as other general heuristic algorithms, such as First Fit, Random, and Tetris. DeepEnergyJSV2.0 achieves better energy efficiency than DeepEnergyJS by about 7.814%.

Optimal Inventory and Price Markdown Policy for a Two-Layer Market with Demand being Price and Time Dependent

  • Jeon, Seong-Hye;Sung, Chang-Sup
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2006.11a
    • /
    • pp.142-146
    • /
    • 2006
  • This paper considers a SCM issue concerned with an integrated problem of inventory control and dynamic pricing strategies when demands are price and time dependent. The associated price markdowns are conducted for inventory control in a two-layer market consisting of retailer and outlet as in fashion apparel market. The objective function consists of revenue terms (sales revenue and salvage value) and purchasing cost term. Specifically, decisions on price markdowns and order quantity are made to maximize total profit in the supply chain so as to have zero inventory level at the end of the sales horizon. To solve the proposed problem, a gradient method is applied, which shows an optimal decision on both the initial inventory level and the discount pricing policy. Sensitivity analysis is conducted on the demand parameters and the final comments on the practical use of the proposed model are presented.

  • PDF