• Title/Summary/Keyword: Value-based reinforcement

Search Result 162, Processing Time 0.022 seconds

Comparison of value-based Reinforcement Learning Algorithms in Cart-Pole Environment

  • Byeong-Chan Han;Ho-Chan Kim;Min-Jae Kang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.166-175
    • /
    • 2023
  • Reinforcement learning can be applied to a wide variety of problems. However, the fundamental limitation of reinforcement learning is that it is difficult to derive an answer within a given time because the problems in the real world are too complex. Then, with the development of neural network technology, research on deep reinforcement learning that combines deep learning with reinforcement learning is receiving lots of attention. In this paper, two types of neural networks are combined with reinforcement learning and their characteristics were compared and analyzed with existing value-based reinforcement learning algorithms. Two types of neural networks are FNN and CNN, and existing reinforcement learning algorithms are SARSA and Q-learning.

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

The investigation of pH threshold value on the corrosion of steel reinforcement in concrete

  • Pu, Qi;Yao, Yan;Wang, Ling;Shi, Xingxiang;Luo, Jingjing;Xie, Yifei
    • Computers and Concrete
    • /
    • v.19 no.3
    • /
    • pp.257-262
    • /
    • 2017
  • The aim of this study is to investigate the pH threshold value for the corrosion of steel reinforcement in concrete. A method was designed to attain the pH value of the pore solution on the location of the steel in concrete. Then the pH values of the pore solution on the location of steel in concrete were changed by exposing the samples to the environment (CO25%, RH 40%) to accelerate carbonation with different periods. Based on this, the pH threshold value for the corrosion of steel reinforcement had been examined by the methods of half-cell potential and electrochemical impedance spectra (EIS). The results have indicated that the pH threshold value for the initial corrosion of steel reinforcement in concrete was 11.21. However, in the carbonated concrete, agreement among whether steel corrosion was initiatory determined by the detection methods mentioned above could be found.

Performance Enhancement of CSMA/CA MAC Protocol Based on Reinforcement Learning

  • Kim, Tae-Wook;Hwang, Gyung-Ho
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Reinforcement learning is an area of machine learning that studies how an intelligent agent takes actions in a given environment to maximize the cumulative reward. In this paper, we propose a new MAC protocol based on the Q-learning technique of reinforcement learning to improve the performance of the IEEE 802.11 wireless LAN CSMA/CA MAC protocol. Furthermore, the operation of each access point (AP) and station is proposed. The AP adjusts the value of the contention window (CW), which is the range for determining the backoff number of the station, according to the wireless traffic load. The station improves the performance by selecting an optimal backoff number with the lowest packet collision rate and the highest transmission success rate through Q-learning within the CW value transmitted from the AP. The result of the performance evaluation through computer simulations showed that the proposed scheme has a higher throughput than that of the existing CSMA/CA scheme.

An Artificial Intelligence Game Agent Using CNN Based Records Learning and Reinforcement Learning (CNN 기반 기보학습 및 강화학습을 이용한 인공지능 게임 에이전트)

  • Jeon, Youngjin;Cho, Youngwan
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1187-1194
    • /
    • 2019
  • This paper proposes a CNN architecture as value function network of an artificial intelligence Othello game agent and its learning scheme using reinforcement learning algorithm. We propose an approach to construct the value function network by using CNN to learn the records of professional players' real game and an approach to enhance the network parameter by learning from self-play using reinforcement learning algorithm. The performance of value function network CNN was compared with existing ANN by letting two agents using each network to play games each other. As a result, the winning rate of the CNN agent was 69.7% and 72.1% as black and white, respectively. In addition, as a result of applying the reinforcement learning, the performance of the agent was improved by showing 100% and 78% winning rate, respectively, compared with the network-based agent without the reinforcement learning.

Study on fracture characteristics of reinforced concrete wedge splitting tests

  • HU, Shaowei;XU, Aiqing;HU, Xin;YIN, Yangyang
    • Computers and Concrete
    • /
    • v.18 no.3
    • /
    • pp.337-354
    • /
    • 2016
  • To study the influence on fracture properties of reinforced concrete wedge splitting test specimens by the addition of reinforcement, and the restriction of steel bars on crack propagation, 7 groups reinforced concrete specimens of different reinforcement position and 1 group plain concrete specimens with the same size factors were designed and constructed for the tests. Based on the double-K fracture criterion and tests, fracture toughness calculation model which was suitable for reinforced concrete wedge splitting tensile specimens has been obtained. The results show that: the value of initial craking load Pini and unstable fracture load Pun decreases gradually with the distance of reinforcement away from specimens's top. Compared with plain concrete specimens, addition of steel bar can reduce the value of initial fracture toughness KIini, but significantly increase the value of the critical effective crack length ac and unstable fracture toughness KIun. For tensional concrete member, the effect of anti-cracking by reinforcement was mainly acted after cracking, the best function of preventing fracture initiation was when the steel bar was placed in the middle of the crack, and when the reinforcement was across the crack and located away from crack tip, it plays the best role in inhibiting the extension of crack.

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • v.54 no.9
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

Assessment of reliability-based FRP reinforcement ratio for concrete structures with recycled coarse aggregate

  • Ju, Minkwan;Park, Kyoungsoo;Lee, Kihong;Ahn, Ki Yong;Sim, Jongsung
    • Structural Engineering and Mechanics
    • /
    • v.69 no.4
    • /
    • pp.399-405
    • /
    • 2019
  • The present study assessed the reliability-based reinforcement ratio of FRP reinforced concrete structure applying recycled coarse aggregate (RCA) concrete. The statistical characteristics of FRP bars and RCA concrete were investigated from the previous literatures and the mean value and standard deviation were employed for the reliability analysis. The statistics can be regarded as the material uncertainty for configuring the probability distribution model. The target bridge structure is the railway bridge with double T-beam section. The replacement ratios of RCA were 0%, 30%, 50%, and 100%. From the probability distribution analysis, the reliability-based reinforcement ratios of FRP bars were assessed with four cases according to the replacement ratio of RCA. The reinforcement ratio of FRP bars at RCA 100% showed about 17.3% higher than the RCA 0%, where the compressive strength at RCA 100% decreased up to 27.5% than RCA 0%. It was found that the decreased effect of the compressive strength of RCA concrete could be compensated with increase of the reinforcement ratio of FRP bars. This relationship obtained by the reliability analysis can be utilized as a useful information in structural design for FRP bar reinforced concrete structures applying RCA concrete.

Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle (수중운동체의 롤 제어를 위한 Deep Deterministic Policy Gradient 기반 강화학습)

  • Kim, Su Yong;Hwang, Yeon Geol;Moon, Sung Woong
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.5
    • /
    • pp.558-568
    • /
    • 2021
  • The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

Experimental study on flexural strength of reinforced modular composite profiled beams

  • Ahn, Hyung-Joon;Ryu, Soo-Hyun
    • Steel and Composite Structures
    • /
    • v.8 no.4
    • /
    • pp.313-328
    • /
    • 2008
  • This study attempts to suggest bending reinforcement method by applying bending reinforcement to composite profile beam in which the concept of prefabrication is introduced. Profile use can be in place of framework and is effective in improvement of shear and bending strength and advantageous in long-term deflection. As a result of experiment, MPB-CB2 with improved module had higher strength and ductility than the previously published MPB-CB and MPB-LB. In case of bending reinforcement with deformed bar and built-up T-shape section based on MPB-CB2, the MPB-RB series reinforced with deformed bar were found to have higher initial stiffness, bending strength and ductility than the MPB-RT series. The less reinforcement effect of the MPB-RT series might be caused by poor concrete filling at the bottom of the built-up T-shape. In comparison between theoretical values and experimental values using minimum yield strength, the ratio between experimental value and theoretical value was shown to be 0.9 or higher except for MPB-RB16 and MPB-RT16 that have more reinforcement compared to the section, thus it is deemed that the reinforced modular composite profiled beam is highly applicable on the basis of minimum yield strength.