• Title/Summary/Keyword: Reinforcement Value

Search Result 461, Processing Time 0.025 seconds

An Artificial Intelligence Game Agent Using CNN Based Records Learning and Reinforcement Learning (CNN 기반 기보학습 및 강화학습을 이용한 인공지능 게임 에이전트)

  • Jeon, Youngjin;Cho, Youngwan
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1187-1194
    • /
    • 2019
  • This paper proposes a CNN architecture as value function network of an artificial intelligence Othello game agent and its learning scheme using reinforcement learning algorithm. We propose an approach to construct the value function network by using CNN to learn the records of professional players' real game and an approach to enhance the network parameter by learning from self-play using reinforcement learning algorithm. The performance of value function network CNN was compared with existing ANN by letting two agents using each network to play games each other. As a result, the winning rate of the CNN agent was 69.7% and 72.1% as black and white, respectively. In addition, as a result of applying the reinforcement learning, the performance of the agent was improved by showing 100% and 78% winning rate, respectively, compared with the network-based agent without the reinforcement learning.

Goal-Directed Reinforcement Learning System (목표지향적 강화학습 시스템)

  • Lee, Chang-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.5
    • /
    • pp.265-270
    • /
    • 2010
  • Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like TD-learning and TD(${\lambda}$)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present GDRLS algorithm for finding the shortest path faster in a maze environment. GDRLS is select the candidate states that can guide the shortest path in maze environment, and learn only the candidate states to find the shortest path. Through experiments, we can see that GDRLS can search the shortest path faster than TD-learning and TD(${\lambda}$)-learning in maze environment.

Experimental study on flexural strength of reinforced modular composite profiled beams

  • Ahn, Hyung-Joon;Ryu, Soo-Hyun
    • Steel and Composite Structures
    • /
    • v.8 no.4
    • /
    • pp.313-328
    • /
    • 2008
  • This study attempts to suggest bending reinforcement method by applying bending reinforcement to composite profile beam in which the concept of prefabrication is introduced. Profile use can be in place of framework and is effective in improvement of shear and bending strength and advantageous in long-term deflection. As a result of experiment, MPB-CB2 with improved module had higher strength and ductility than the previously published MPB-CB and MPB-LB. In case of bending reinforcement with deformed bar and built-up T-shape section based on MPB-CB2, the MPB-RB series reinforced with deformed bar were found to have higher initial stiffness, bending strength and ductility than the MPB-RT series. The less reinforcement effect of the MPB-RT series might be caused by poor concrete filling at the bottom of the built-up T-shape. In comparison between theoretical values and experimental values using minimum yield strength, the ratio between experimental value and theoretical value was shown to be 0.9 or higher except for MPB-RB16 and MPB-RT16 that have more reinforcement compared to the section, thus it is deemed that the reinforced modular composite profiled beam is highly applicable on the basis of minimum yield strength.

On-line Reinforcement Learning for Cart-pole Balancing Problem (카트-폴 균형 문제를 위한 실시간 강화 학습)

  • Kim, Byung-Chun;Lee, Chang-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.4
    • /
    • pp.157-162
    • /
    • 2010
  • The cart-pole balancing problem is a pseudo-standard benchmark problem from the field of control methods including genetic algorithms, artificial neural networks, and reinforcement learning. In this paper, we propose a novel approach by using online reinforcement learning(OREL) to solve this cart-pole balancing problem. The objective is to analyze the learning method of the OREL learning system in the cart-pole balancing problem. Through experiment, we can see that approximate faster the optimal value-function than Q-learning.

Performance Enhancement of CSMA/CA MAC Protocol Based on Reinforcement Learning

  • Kim, Tae-Wook;Hwang, Gyung-Ho
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Reinforcement learning is an area of machine learning that studies how an intelligent agent takes actions in a given environment to maximize the cumulative reward. In this paper, we propose a new MAC protocol based on the Q-learning technique of reinforcement learning to improve the performance of the IEEE 802.11 wireless LAN CSMA/CA MAC protocol. Furthermore, the operation of each access point (AP) and station is proposed. The AP adjusts the value of the contention window (CW), which is the range for determining the backoff number of the station, according to the wireless traffic load. The station improves the performance by selecting an optimal backoff number with the lowest packet collision rate and the highest transmission success rate through Q-learning within the CW value transmitted from the AP. The result of the performance evaluation through computer simulations showed that the proposed scheme has a higher throughput than that of the existing CSMA/CA scheme.

Reinforcement Learning Using State Space Compression (상태 공간 압축을 이용한 강화학습)

  • Kim, Byeong-Cheon;Yun, Byeong-Ju
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.3
    • /
    • pp.633-640
    • /
    • 1999
  • Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like Q-learning and TD(Temporal Difference)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present COMREL(COMpressed REinforcement Learning) algorithm for finding the shortest path fast in a maze environment, select the candidate states that can guide the shortest path in compressed maze environment, and learn only the candidate states to find the shortest path. After comparing COMREL algorithm with the already existing Q-learning and Priortized Sweeping algorithm, we could see that the learning time shortened very much.

  • PDF

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • v.54 no.9
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

An Examination of the Minimum Reinforcement Ratio for Reinforced Concrete Flexural Members (철근콘크리트 휨부재의 최소철근비에 대한 고찰)

  • Choi, Seung-Won
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.21 no.6
    • /
    • pp.35-43
    • /
    • 2017
  • The minimum reinforcement ratio is an important design factor to prevent a brittle failure in RC flexural members. A minimum reinforcement ratio is presented by assuming an effective depth of cross-section and moment arm lever in CDC and KHBDC. In this study, it suggests that a rational method for minimum reinforcement ratio is calculated by material model and force equilibrium. As results, a minimum reinforcement ratio using a p-r curve in KHBDC is evaluated about 52~80% of recent design code's value and it induces an economical design. And also, a ductility capacity in case of placing this minimum reinforcement amount is evaluated about 89% of recent design code's value, but ductility in a member is 7 or more, so it has a sufficient ductility capacity. Therefore, it is judged that a minimum reinforcement ratio using p-r curve has a theoretical rationality, safety and economy in a flexural member design.

Analytical Study of Behavior on Structure Reinforced Fiber Sheet (섬유시트 보강 구조체의 거동에 관한 해석적 연구)

  • Seo, Seung-Tag
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.12 no.2
    • /
    • pp.107-112
    • /
    • 2009
  • The effective reinforcement methods of structure is required to improve the durability of existing structures. Recently, the continuous fiber sheets to the concrete structures are widely used in the earthquake-proof reinforcement method. This study examines suitability and effect to concrete structure of fiber by FEM analysis. The result of analysis is as follows; All specimens occurred bending tensile failure at the middle span. Ultimate strength of specimen in the RC and reinforced RC specimen were 53.9 kN, 56.3 kN respectively and it was some low by degree 0.89, 0.82 to compare with calculated result. The deflection of specimen at the middle span occurred in approximately 0.2 mm, and did linear behavior in load 20 kN by seat reinforcement. Stiffness did not decrease by occurrence in the finer crack and reinforcement beam's flexure stiffness was increased until reach in failure. To compare calculated value and analysis value, it almost equal behavior in the elastic reign and can confirm effectiveness of analysis. Crack was distributed uniformly by reinforcement of fiber seat at failure and it do not occurred stiffness decreases.

  • PDF

Online Reinforcement Learning to Search the Shortest Path in Maze Environments (미로 환경에서 최단 경로 탐색을 위한 실시간 강화 학습)

  • Kim, Byeong-Cheon;Kim, Sam-Geun;Yun, Byeong-Ju
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.155-162
    • /
    • 2002
  • Reinforcement learning is a learning method that uses trial-and-error to perform Learning by interacting with dynamic environments. It is classified into online reinforcement learning and delayed reinforcement learning. In this paper, we propose an online reinforcement learning system (ONRELS : Outline REinforcement Learning System). ONRELS updates the estimate-value about all the selectable (state, action) pairs before making state-transition at the current state. The ONRELS learns by interacting with the compressed environments through trial-and-error after it compresses the state space of the mage environments. Through experiments, we can see that ONRELS can search the shortest path faster than Q-learning using TD-ewor and $Q(\lambda{)}$-learning using $TD(\lambda{)}$ in the maze environments.