• Title/Summary/Keyword: Q-learning algorithm

Search Result 152, Processing Time 0.03 seconds

Development of Semi-Active Control Algorithm Using Deep Q-Network (Deep Q-Network를 이용한 준능동 제어알고리즘 개발)

  • Kim, Hyun-Su;Kang, Joo-Won
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.1
    • /
    • pp.79-86
    • /
    • 2021
  • Control performance of a smart tuned mass damper (TMD) mainly depends on control algorithms. A lot of control strategies have been proposed for semi-active control devices. Recently, machine learning begins to be applied to development of vibration control algorithm. In this study, a reinforcement learning among machine learning techniques was employed to develop a semi-active control algorithm for a smart TMD. The smart TMD was composed of magnetorheological damper in this study. For this purpose, an 11-story building structure with a smart TMD was selected to construct a reinforcement learning environment. A time history analysis of the example structure subject to earthquake excitation was conducted in the reinforcement learning procedure. Deep Q-network (DQN) among various reinforcement learning algorithms was used to make a learning agent. The command voltage sent to the MR damper is determined by the action produced by the DQN. Parametric studies on hyper-parameters of DQN were performed by numerical simulations. After appropriate training iteration of the DQN model with proper hyper-parameters, the DQN model for control of seismic responses of the example structure with smart TMD was developed. The developed DQN model can effectively control smart TMD to reduce seismic responses of the example structure.

(The Development of Janggi Board Game Using Backpropagation Neural Network and Q Learning Algorithm) (역전파 신경회로망과 Q학습을 이용한 장기보드게임 개발)

  • 황상문;박인규;백덕수;진달복
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.1
    • /
    • pp.83-90
    • /
    • 2002
  • This paper proposed the strategy learning method by means of the fusion of Back-Propagation neural network and Q learning algorithm for two-person, deterministic janggi board game. The learning process is accomplished simply through the playing each other. The system consists of two parts of move generator and search kernel. The one consists of move generator generating the moves on the board, the other consists of back-propagation and Q learning plus $\alpha$$\beta$ search algorithm in an attempt to learn the evaluation function. while temporal difference learns the discrepancy between the adjacent rewards, Q learning acquires the optimal policies even when there is no prior knowledge of effects of its moves on the environment through the learning of the evaluation function for the augmented rewards. Depended on the evaluation function through lots of games through the learning procedure it proved that the percentage won is linearly proportional to the portion of learning in general.

Dodecagon-based Q-learning Algorithm using SVM for Object Search of Robot (로봇의 목표물 추적을 위한 SVM과 12각형 기반의 Q-learning 알고리즘)

  • Seo, Sang-Wook;Jang, In-Hun;Sim, Kwee-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.227-230
    • /
    • 2007
  • 본 논문에서는 로봇의 목표물 추적을 위하여 SVM을 이용한 12각형 기반의 Q-learning 알고리즘을 제안한다. 제안한 알고리즘의 유효성을 보이기 위해 본 논문에서는 두 대의 로봇과 장애물 그리고 하나의 목표물로 정하고, 각각의 로봇이 숨겨진 목표물을 찾아내는 실험을 가정하여 무작위, DBAM과 AMAB의 융합 모델, 마지막으로는 본 논문에서 제안한 SVM과 12각형 기반의 Q-learning 알고리즘을 이용하여 실험을 수행하고, 이 3가지 방법을 비교하여 본 논문의 유효성을 검증하였다.

  • PDF

Q-learning for intersection traffic flow Control based on agents

  • Zhou, Xuan;Chong, Kil-To
    • Proceedings of the IEEK Conference
    • /
    • 2009.05a
    • /
    • pp.94-96
    • /
    • 2009
  • In this paper, we present the Q-learning method for adaptive traffic signal control on the basis of multi-agent technology. The structure is composed of sixphase agents and one intersection agent. Wireless communication network provides the possibility of the cooperation of agents. As one kind of reinforcement learning, Q-learning is adopted as the algorithm of the control mechanism, which can acquire optical control strategies from delayed reward; furthermore, we adopt dynamic learning method instead of static method, which is more practical. Simulation result indicates that it is more effective than traditional signal system.

  • PDF

Reinforcement Learning Using State Space Compression (상태 공간 압축을 이용한 강화학습)

  • Kim, Byeong-Cheon;Yun, Byeong-Ju
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.3
    • /
    • pp.633-640
    • /
    • 1999
  • Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like Q-learning and TD(Temporal Difference)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present COMREL(COMpressed REinforcement Learning) algorithm for finding the shortest path fast in a maze environment, select the candidate states that can guide the shortest path in compressed maze environment, and learn only the candidate states to find the shortest path. After comparing COMREL algorithm with the already existing Q-learning and Priortized Sweeping algorithm, we could see that the learning time shortened very much.

  • PDF

Online Evolution for Cooperative Behavior in Group Robot Systems

  • Lee, Dong-Wook;Seo, Sang-Wook;Sim, Kwee-Bo
    • International Journal of Control, Automation, and Systems
    • /
    • v.6 no.2
    • /
    • pp.282-287
    • /
    • 2008
  • In distributed mobile robot systems, autonomous robots accomplish complicated tasks through intelligent cooperation with each other. This paper presents behavior learning and online distributed evolution for cooperative behavior of a group of autonomous robots. Learning and evolution capabilities are essential for a group of autonomous robots to adapt to unstructured environments. Behavior learning finds an optimal state-action mapping of a robot for a given operating condition. In behavior learning, a Q-learning algorithm is modified to handle delayed rewards in the distributed robot systems. A group of robots implements cooperative behaviors through communication with other robots. Individual robots improve the state-action mapping through online evolution with the crossover operator based on the Q-values and their update frequencies. A cooperative material search problem demonstrated the effectiveness of the proposed behavior learning and online distributed evolution method for implementing cooperative behavior of a group of autonomous mobile robots.

Path selection algorithm for multi-path system based on deep Q learning (Deep Q 학습 기반의 다중경로 시스템 경로 선택 알고리즘)

  • Chung, Byung Chang;Park, Heasook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.50-55
    • /
    • 2021
  • Multi-path system is a system in which utilizes various networks simultaneously. It is expected that multi-path system can enhance communication speed, reliability, security of network. In this paper, we focus on path selection in multi-path system. To select optimal path, we propose deep reinforcement learning algorithm which is rewarded by the round-trip-time (RTT) of each networks. Unlike multi-armed bandit model, deep Q learning is applied to consider rapidly changing situations. Due to the delay of RTT data, we also suggest compensation algorithm of the delayed reward. Moreover, we implement testbed learning server to evaluate the performance of proposed algorithm. The learning server contains distributed database and tensorflow module to efficiently operate deep learning algorithm. By means of simulation, we showed that the proposed algorithm has better performance than lowest RTT about 20%.

Fuzzy Q-learning using Weighted Eligibility (가중 기여도를 이용한 퍼지 Q-learning)

  • 정석일;이연정
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.163-167
    • /
    • 2000
  • The eligibility is used to solve the credit-assignment problem which is one of important problems in reinforcement learning. Conventional eligibilities which are accumulating eligibility and replacing eligibility make ineffective use of rewards acquired in learning process. Because only an executed action in a visited state is learned by these eligibilities. Thus, we propose a new eligibility, called the weighted eligibility with which not only an executed action but also neighboring actions in a visited state are to be learned. The fuzzy Q-learning algorithm using proposed eligibility is applied to a cart-pole balancing problem, which shows improvement of learning speed.

  • PDF

Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, and Q-Learning

  • Park, Ji Su;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.16 no.5
    • /
    • pp.1001-1007
    • /
    • 2020
  • In recent years, machine learning algorithms are continuously being used and expanded in various fields, such as facial recognition, signal processing, personal authentication, and stock prediction. In particular, various algorithms, such as deep learning, reinforcement learning, and Q-learning, are continuously being improved. Among these algorithms, the expansion of deep learning is rapidly changing. Nevertheless, machine learning algorithms have not yet been applied in several fields, such as personal authentication technology. This technology is an essential tool in the digital information era, walking recognition technology as promising biometrics, and technology for solving state-space problems. Therefore, algorithm technologies of deep learning, reinforcement learning, and Q-learning, which are typical machine learning algorithms in various fields, such as agricultural technology, personal authentication, wireless network, game, biometric recognition, and image recognition, are being improved and expanded in this paper.

Model-free $H_{\infty}$ Control of Linear Discrete-time Systems using Q-learning and LMI Based on I/O Data (입출력 데이터 기반 Q-학습과 LMI를 이용한 선형 이산 시간 시스템의 모델-프리 $H_{\infty}$ 제어기 설계)

  • Kim, Jin-Hoon;Lewis, F.L.
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.7
    • /
    • pp.1411-1417
    • /
    • 2009
  • In this paper, we consider the design of $H_{\infty}$ control of linear discrete-time systems having no mathematical model. The basic approach is to use Q-learning which is a reinforcement learning method based on actor-critic structure. The model-free control design is to use not the mathematical model of the system but the informations on states and inputs. As a result, the derived iterative algorithm is expressed as linear matrix inequalities(LMI) of measured data from system states and inputs. It is shown that, for a sufficiently rich enough disturbance, this algorithm converges to the standard $H_{\infty}$ control solution obtained using the exact system model. A simple numerical example is given to show the usefulness of our result on practical application.