• Title/Summary/Keyword: reinforcement algorithms

Search Result 148, Processing Time 0.025 seconds

Reward Design of Reinforcement Learning for Development of Smart Control Algorithm (스마트 제어알고리즘 개발을 위한 강화학습 리워드 설계)

  • Kim, Hyun-Su;Yoon, Ki-Yong
    • Journal of Korean Association for Spatial Structures
    • /
    • v.22 no.2
    • /
    • pp.39-46
    • /
    • 2022
  • Recently, machine learning is widely used to solve optimization problems in various engineering fields. In this study, machine learning is applied to development of a control algorithm for a smart control device for reduction of seismic responses. For this purpose, Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm. A single degree of freedom (SDOF) structure with a smart tuned mass damper (TMD) was used as an example structure. A smart TMD system was composed of MR (magnetorheological) damper instead of passive damper. Reward design of reinforcement learning mainly affects the control performance of the smart TMD. Various hyper-parameters were investigated to optimize the control performance of DQN-based control algorithm. Usually, decrease of the time step for numerical simulation is desirable to increase the accuracy of simulation results. However, the numerical simulation results presented that decrease of the time step for reward calculation might decrease the control performance of DQN-based control algorithm. Therefore, a proper time step for reward calculation should be selected in a DQN training process.

Adaptive Success Rate-based Sensor Relocation for IoT Applications

  • Kim, Moonseong;Lee, Woochan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3120-3137
    • /
    • 2021
  • Small-sized IoT wireless sensing devices can be deployed with small aircraft such as drones, and the deployment of mobile IoT devices can be relocated to suit data collection with efficient relocation algorithms. However, the terrain may not be able to predict its shape. Mobile IoT devices suitable for these terrains are hopping devices that can move with jumps. So far, most hopping sensor relocation studies have made the unrealistic assumption that all hopping devices know the overall state of the entire network and each device's current state. Recent work has proposed the most realistic distributed network environment-based relocation algorithms that do not require sharing all information simultaneously. However, since the shortest path-based algorithm performs communication and movement requests with terminals, it is not suitable for an area where the distribution of obstacles is uneven. The proposed scheme applies a simple Monte Carlo method based on relay nodes selection random variables that reflect the obstacle distribution's characteristics to choose the best relay node as reinforcement learning, not specific relay nodes. Using the relay node selection random variable could significantly reduce the generation of additional messages that occur to select the shortest path. This paper's additional contribution is that the world's first distributed environment-based relocation protocol is proposed reflecting real-world physical devices' characteristics through the OMNeT++ simulator. We also reconstruct the three days-long disaster environment, and performance evaluation has been performed by applying the proposed protocol to the simulated real-world environment.

A Study on the Portfolio Performance Evaluation using Actor-Critic Reinforcement Learning Algorithms (액터-크리틱 모형기반 포트폴리오 연구)

  • Lee, Woo Sik
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.3
    • /
    • pp.467-476
    • /
    • 2022
  • The Bank of Korea raised the benchmark interest rate by a quarter percentage point to 1.75 percent per year, and analysts predict that South Korea's policy rate will reach 2.00 percent by the end of calendar year 2022. Furthermore, because market volatility has been significantly increased by a variety of factors, including rising rates, inflation, and market volatility, many investors have struggled to meet their financial objectives or deliver returns. Banks and financial institutions are attempting to provide Robo-Advisors to manage client portfolios without human intervention in this situation. In this regard, determining the best hyper-parameter combination is becoming increasingly important. This study compares some activation functions of the Deep Deterministic Policy Gradient(DDPG) and Twin-delayed Deep Deterministic Policy Gradient (TD3) Algorithms to choose a sequence of actions that maximizes long-term reward. The DDPG and TD3 outperformed its benchmark index, according to the results. One reason for this is that we need to understand the action probabilities in order to choose an action and receive a reward, which we then compare to the state value to determine an advantage. As interest in machine learning has grown and research into deep reinforcement learning has become more active, finding an optimal hyper-parameter combination for DDPG and TD3 has become increasingly important.

Learning soccer robot using genetic programming

  • Wang, Xiaoshu;Sugisaka, Masanori
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1999.10a
    • /
    • pp.292-297
    • /
    • 1999
  • Evolving in artificial agent is an extremely difficult problem, but on the other hand, a challenging task. At present the studies mainly centered on single agent learning problem. In our case, we use simulated soccer to investigate multi-agent cooperative learning. Consider the fundamental differences in learning mechanism, existing reinforcement learning algorithms can be roughly classified into two types-that based on evaluation functions and that of searching policy space directly. Genetic Programming developed from Genetic Algorithms is one of the most well known approaches belonging to the latter. In this paper, we give detailed algorithm description as well as data construction that are necessary for learning single agent strategies at first. In following step moreover, we will extend developed methods into multiple robot domains. game. We investigate and contrast two different methods-simple team learning and sub-group loaming and conclude the paper with some experimental results.

  • PDF

Development of Artificial Intelligence Janggi Game based on Machine Learning Algorithm (기계학습 알고리즘 기반의 인공지능 장기 게임 개발)

  • Jang, Myeonggyu;Kim, Youngho;Min, Dongyeop;Park, Kihyeon;Lee, Seungsoo;Woo, Chongwoo
    • Journal of Information Technology Services
    • /
    • v.16 no.4
    • /
    • pp.137-148
    • /
    • 2017
  • Researches on the Artificial Intelligence has been explosively activated in various fields since the advent of AlphaGo. Particularly, researchers on the application of multi-layer neural network such as deep learning, and various machine learning algorithms are being focused actively. In this paper, we described a development of an artificial intelligence Janggi game based on reinforcement learning algorithm and MCTS (Monte Carlo Tree Search) algorithm with accumulated game data. The previous artificial intelligence games are mostly developed based on mini-max algorithm, which depends only on the results of the tree search algorithms. They cannot use of the real data from the games experts, nor cannot enhance the performance by learning. In this paper, we suggest our approach to overcome those limitations as follows. First, we collects Janggi expert's game data, which can reflect abundant real game results. Second, we create a graph structure by using the game data, which can remove redundant movement. And third, we apply the reinforcement learning algorithm and MCTS algorithm to select the best next move. In addition, the learned graph is stored by object serialization method to provide continuity of the game. The experiment of this study is done with two different types as follows. First, our system is confronted with other AI based system that is currently being served on the internet. Second, our system confronted with some Janggi experts who have winning records of more than 50%. Experimental results show that the rate of our system is significantly higher.

Reinforcement Learning-based Dynamic Weapon Assignment to Multi-Caliber Long-Range Artillery Attacks (다종 장사정포 공격에 대한 강화학습 기반의 동적 무기할당)

  • Hyeonho Kim;Jung Hun Kim;Joohoe Kong;Ji Hoon Kyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.42-52
    • /
    • 2022
  • North Korea continues to upgrade and display its long-range rocket launchers to emphasize its military strength. Recently Republic of Korea kicked off the development of anti-artillery interception system similar to Israel's "Iron Dome", designed to protect against North Korea's arsenal of long-range rockets. The system may not work smoothly without the function assigning interceptors to incoming various-caliber artillery rockets. We view the assignment task as a dynamic weapon target assignment (DWTA) problem. DWTA is a multistage decision process in which decision in a stage affects decision processes and its results in the subsequent stages. We represent the DWTA problem as a Markov decision process (MDP). Distance from Seoul to North Korea's multiple rocket launchers positioned near the border, limits the processing time of the model solver within only a few second. It is impossible to compute the exact optimal solution within the allowed time interval due to the curse of dimensionality inherently in MDP model of practical DWTA problem. We apply two reinforcement-based algorithms to get the approximate solution of the MDP model within the time limit. To check the quality of the approximate solution, we adopt Shoot-Shoot-Look(SSL) policy as a baseline. Simulation results showed that both algorithms provide better solution than the solution from the baseline strategy.

Genetic algorithm-based geometric and reinforcement limits for cost effective design of RC cantilever retaining walls

  • Mansoor Shakeel;Rizwan Azam;Muhammad R. Riaz
    • Structural Engineering and Mechanics
    • /
    • v.86 no.3
    • /
    • pp.337-348
    • /
    • 2023
  • The optimization of reinforced concrete (RC) cantilever retaining walls is a complex problem and requires the use of advanced techniques like metaheuristic algorithms. For this purpose, an optimization model must first be developed, which involves mathematical complications, multidisciplinary knowledge, and programming skills. This task has proven to be too arduous and has halted the mainstream acceptance of optimization. Therefore, it is necessary to unravel the complications of optimization into an easily applicable form. Currently, the most commonly used method for designing retaining walls is by following the proportioning limits provided by the ACI handbook. However, these limits, derived manually, are not verified by any optimization technique. There is a need to validate or modify these limits, using optimization algorithms to consider them as optimal limits. Therefore, this study aims to propose updated proportioning limits for the economical design of a RC cantilever retaining wall through a comprehensive parametric investigation using the genetic algorithm (GA). Multiple simulations are run to examine various design parameters, and trends are drawn to determine effective ranges. The optimal limits are derived for 5 geometric and 3 reinforcement variables and validated by comparison with their predecessor, ACI's preliminary proportioning limits. The results indicate close proximity between the optimized and code-provided ranges; however, the use of optimal limits can lead to additional cost optimization. Modifications to achieve further optimization are also discussed. Besides the geometric variables, other design parameters not covered by the ACI building code, like reinforcement ratios, bar diameters, and material strengths, and their effects on cost optimization, are also discussed. The findings of this investigation can be used by experienced engineers to refine their designs, without delving into the complexities of optimization.

Modeling shear capacity of RC slender beams without stirrups using genetic algorithms

  • Nehdi, M.;Greenough, T.
    • Smart Structures and Systems
    • /
    • v.3 no.1
    • /
    • pp.51-68
    • /
    • 2007
  • High-strength concrete (HSC) is becoming increasingly attractive for various construction projects since it offers a multitude of benefits over normal-strength concrete (NSC). Unfortunately, current design provisions for shear capacity of RC slender beams are generally based on data developed for NSC members having a compressive strength of up to 50 MPa, with limited recommendations on the use of HSC. The failure of HSC beams is noticeably different than that of NSC beams since the transition zone between the cement paste and aggregates is much denser in HSC. Thus, unlike NSC beams in which micro-cracks propagate around aggregates, providing significant aggregate interlock, micro-cracks in HSC are trans-granular, resulting in relatively smoother fracture surfaces, thereby inhibiting aggregate interlock as a shear transfer mechanism and reducing the influence of compressive strength on the ultimate shear strength of HSC beams. In this study, a new approach based on genetic algorithms (GAs) was used to predict the shear capacity of both NSC and HSC slender beams without shear reinforcement. Shear capacity predictions of the GA model were compared to calculations of four other commonly used methods: the ACI method, CSA method, Eurocode-2, and Zsutty's equation. A parametric study was conducted to evaluate the ability of the GA model to capture the effect of basic shear design parameters on the behaviour of reinforced concrete (RC) beams under shear loading. The parameters investigated include compressivestrength, amount of longitudinal reinforcement, and beam's depth. It was found that the GA model provided more accurate evaluation of shear capacity compared to that of the other common methods and better captured the influence of the significant shear design parameters. Therefore, the GA model offers an attractive user-friendly alternative to conventional shear design methods.

R-Trader: An Automatic Stock Trading System based on Reinforcement learning (R-Trader: 강화 학습에 기반한 자동 주식 거래 시스템)

  • 이재원;김성동;이종우;채진석
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.785-794
    • /
    • 2002
  • Automatic stock trading systems should be able to solve various kinds of optimization problems such as market trend prediction, stock selection, and trading strategies, in a unified framework. But most of the previous trading systems based on supervised learning have a limit in the ultimate performance, because they are not mainly concerned in the integration of those subproblems. This paper proposes a stock trading system, called R-Trader, based on reinforcement teaming, regarding the process of stock price changes as Markov decision process (MDP). Reinforcement learning is suitable for Joint optimization of predictions and trading strategies. R-Trader adopts two popular reinforcement learning algorithms, temporal-difference (TD) and Q, for selecting stocks and optimizing other trading parameters respectively. Technical analysis is also adopted to devise the input features of the system and value functions are approximated by feedforward neural networks. Experimental results on the Korea stock market show that the proposed system outperforms the market average and also a simple trading system trained by supervised learning both in profit and risk management.

Performance Improvement of Eye Tracking System using Reinforcement Learning (강화학습을 이용한 눈동자 추적 시스템의 성능향상)

  • Shin, Hak-Chul;Shen, Yan;Khim, Sarang;Sung, WonJun;Ahmed, Minhaz Uddin;Hong, Yo-Hoon;Rhee, Phill-Kyu
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.2
    • /
    • pp.171-179
    • /
    • 2013
  • Recognition and image processing technology depends on illumination variation. One of the most important factors is the parameters of algorithms. When it comes to select these values, the system has different types of recognition accuracy. In this paper, we propose performance improvement of the eye tracking system that depends on some environments such as, people, location, and illumination. Optimized threshold parameter was decided by using reinforcement learning. When the system accuracy goes down, reinforcement learning used to train the value of parameters. According to the experimental results, the performance of eye tracking system can be improved from 3% to 14% by using reinforcement learning. The improved eye tracking system can be effectively used for human-computer interaction.