• 제목/요약/키워드: Value-based reinforcement

검색결과 165건 처리시간 0.022초

Comparison of value-based Reinforcement Learning Algorithms in Cart-Pole Environment

  • Byeong-Chan Han;Ho-Chan Kim;Min-Jae Kang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권3호
    • /
    • pp.166-175
    • /
    • 2023
  • Reinforcement learning can be applied to a wide variety of problems. However, the fundamental limitation of reinforcement learning is that it is difficult to derive an answer within a given time because the problems in the real world are too complex. Then, with the development of neural network technology, research on deep reinforcement learning that combines deep learning with reinforcement learning is receiving lots of attention. In this paper, two types of neural networks are combined with reinforcement learning and their characteristics were compared and analyzed with existing value-based reinforcement learning algorithms. Two types of neural networks are FNN and CNN, and existing reinforcement learning algorithms are SARSA and Q-learning.

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • 전기전자학회논문지
    • /
    • 제23권4호
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

The investigation of pH threshold value on the corrosion of steel reinforcement in concrete

  • Pu, Qi;Yao, Yan;Wang, Ling;Shi, Xingxiang;Luo, Jingjing;Xie, Yifei
    • Computers and Concrete
    • /
    • 제19권3호
    • /
    • pp.257-262
    • /
    • 2017
  • The aim of this study is to investigate the pH threshold value for the corrosion of steel reinforcement in concrete. A method was designed to attain the pH value of the pore solution on the location of the steel in concrete. Then the pH values of the pore solution on the location of steel in concrete were changed by exposing the samples to the environment (CO25%, RH 40%) to accelerate carbonation with different periods. Based on this, the pH threshold value for the corrosion of steel reinforcement had been examined by the methods of half-cell potential and electrochemical impedance spectra (EIS). The results have indicated that the pH threshold value for the initial corrosion of steel reinforcement in concrete was 11.21. However, in the carbonated concrete, agreement among whether steel corrosion was initiatory determined by the detection methods mentioned above could be found.

Performance Enhancement of CSMA/CA MAC Protocol Based on Reinforcement Learning

  • Kim, Tae-Wook;Hwang, Gyung-Ho
    • Journal of information and communication convergence engineering
    • /
    • 제19권1호
    • /
    • pp.1-7
    • /
    • 2021
  • Reinforcement learning is an area of machine learning that studies how an intelligent agent takes actions in a given environment to maximize the cumulative reward. In this paper, we propose a new MAC protocol based on the Q-learning technique of reinforcement learning to improve the performance of the IEEE 802.11 wireless LAN CSMA/CA MAC protocol. Furthermore, the operation of each access point (AP) and station is proposed. The AP adjusts the value of the contention window (CW), which is the range for determining the backoff number of the station, according to the wireless traffic load. The station improves the performance by selecting an optimal backoff number with the lowest packet collision rate and the highest transmission success rate through Q-learning within the CW value transmitted from the AP. The result of the performance evaluation through computer simulations showed that the proposed scheme has a higher throughput than that of the existing CSMA/CA scheme.

CNN 기반 기보학습 및 강화학습을 이용한 인공지능 게임 에이전트 (An Artificial Intelligence Game Agent Using CNN Based Records Learning and Reinforcement Learning)

  • 전영진;조영완
    • 전기전자학회논문지
    • /
    • 제23권4호
    • /
    • pp.1187-1194
    • /
    • 2019
  • 본 논문에서는 인공지능 오델로 게임 에이전트를 구현하기 위해 실제 프로기사들의 기보를 CNN으로 학습시키고 이를 상태의 형세 판단을 위한 근거로 삼아 최소최대탐색을 이용해 현 상태에서 최적의 수를 찾는 의사결정구조를 사용하고 이를 발전시키고자 강화학습 이론을 이용한 자가대국 학습방법을 제안하여 적용하였다. 본 논문에서 제안하는 구현 방법은 기보학습의 성능 평가 차원에서 가치평가를 위한 네트워크로서 기존의 ANN을 사용한 방법과 대국을 통한 방법으로 비교하였으며, 대국 결과 흑일 때 69.7%, 백일 때 72.1%의 승률을 나타내었다. 또한 본 논문에서 제안하는 강화학습 적용 결과 네크워크의 성능을 강화학습을 적용하지 않은 ANN 및 CNN 가치평가 네트워크 기반 에이전트와 비교한 결과 각각 100%, 78% 승률을 나타내어 성능이 개선됨을 확인할 수 있었다.

Study on fracture characteristics of reinforced concrete wedge splitting tests

  • HU, Shaowei;XU, Aiqing;HU, Xin;YIN, Yangyang
    • Computers and Concrete
    • /
    • 제18권3호
    • /
    • pp.337-354
    • /
    • 2016
  • To study the influence on fracture properties of reinforced concrete wedge splitting test specimens by the addition of reinforcement, and the restriction of steel bars on crack propagation, 7 groups reinforced concrete specimens of different reinforcement position and 1 group plain concrete specimens with the same size factors were designed and constructed for the tests. Based on the double-K fracture criterion and tests, fracture toughness calculation model which was suitable for reinforced concrete wedge splitting tensile specimens has been obtained. The results show that: the value of initial craking load Pini and unstable fracture load Pun decreases gradually with the distance of reinforcement away from specimens's top. Compared with plain concrete specimens, addition of steel bar can reduce the value of initial fracture toughness KIini, but significantly increase the value of the critical effective crack length ac and unstable fracture toughness KIun. For tensional concrete member, the effect of anti-cracking by reinforcement was mainly acted after cracking, the best function of preventing fracture initiation was when the steel bar was placed in the middle of the crack, and when the reinforcement was across the crack and located away from crack tip, it plays the best role in inhibiting the extension of crack.

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • 제54권9호
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

Assessment of reliability-based FRP reinforcement ratio for concrete structures with recycled coarse aggregate

  • Ju, Minkwan;Park, Kyoungsoo;Lee, Kihong;Ahn, Ki Yong;Sim, Jongsung
    • Structural Engineering and Mechanics
    • /
    • 제69권4호
    • /
    • pp.399-405
    • /
    • 2019
  • The present study assessed the reliability-based reinforcement ratio of FRP reinforced concrete structure applying recycled coarse aggregate (RCA) concrete. The statistical characteristics of FRP bars and RCA concrete were investigated from the previous literatures and the mean value and standard deviation were employed for the reliability analysis. The statistics can be regarded as the material uncertainty for configuring the probability distribution model. The target bridge structure is the railway bridge with double T-beam section. The replacement ratios of RCA were 0%, 30%, 50%, and 100%. From the probability distribution analysis, the reliability-based reinforcement ratios of FRP bars were assessed with four cases according to the replacement ratio of RCA. The reinforcement ratio of FRP bars at RCA 100% showed about 17.3% higher than the RCA 0%, where the compressive strength at RCA 100% decreased up to 27.5% than RCA 0%. It was found that the decreased effect of the compressive strength of RCA concrete could be compensated with increase of the reinforcement ratio of FRP bars. This relationship obtained by the reliability analysis can be utilized as a useful information in structural design for FRP bar reinforced concrete structures applying RCA concrete.

수중운동체의 롤 제어를 위한 Deep Deterministic Policy Gradient 기반 강화학습 (Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle)

  • 김수용;황연걸;문성웅
    • 한국군사과학기술학회지
    • /
    • 제24권5호
    • /
    • pp.558-568
    • /
    • 2021
  • The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

Experimental study on flexural strength of reinforced modular composite profiled beams

  • Ahn, Hyung-Joon;Ryu, Soo-Hyun
    • Steel and Composite Structures
    • /
    • 제8권4호
    • /
    • pp.313-328
    • /
    • 2008
  • This study attempts to suggest bending reinforcement method by applying bending reinforcement to composite profile beam in which the concept of prefabrication is introduced. Profile use can be in place of framework and is effective in improvement of shear and bending strength and advantageous in long-term deflection. As a result of experiment, MPB-CB2 with improved module had higher strength and ductility than the previously published MPB-CB and MPB-LB. In case of bending reinforcement with deformed bar and built-up T-shape section based on MPB-CB2, the MPB-RB series reinforced with deformed bar were found to have higher initial stiffness, bending strength and ductility than the MPB-RT series. The less reinforcement effect of the MPB-RT series might be caused by poor concrete filling at the bottom of the built-up T-shape. In comparison between theoretical values and experimental values using minimum yield strength, the ratio between experimental value and theoretical value was shown to be 0.9 or higher except for MPB-RB16 and MPB-RT16 that have more reinforcement compared to the section, thus it is deemed that the reinforced modular composite profiled beam is highly applicable on the basis of minimum yield strength.