• 제목/요약/키워드: Reinforcement Learning

검색결과 799건 처리시간 0.034초

Acrobot Swing Up 제어를 위한 Credit-Assigned-CMAC 기반의 강화학습 (Credit-Assigned-CMAC-based Reinforcement Learning with application to the Acrobot Swing Up Control Problem)

  • 신연용;장시영;서승환;서일홍
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2003년도 학술회의 논문집 정보 및 제어부문 B
    • /
    • pp.621-624
    • /
    • 2003
  • For real world applications of reinforcement learning techniques, function approximation or generalization will be required to avoid curse of dimensionality. For this, an improved function approximation-based reinforcement learning method is proposed to speed up convergence by using CA-CMAC(Credit-Assigned Cerebellar Model Articulation Controller). To show that our proposed CACRL(CA-CMAC-based Reinforcement Learning) performs better than the CRL(CMAC-based Reinforcement Learning), computer simulation results are illustrated, where a swing-up control problem of an acrobot is considered.

  • PDF

자기 조직화 맵을 이용한 강화학습 제어기 설계 (Design of Reinforcement Learning Controller with Self-Organizing Map)

  • 이재강;김일환
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제53권5호
    • /
    • pp.353-360
    • /
    • 2004
  • This paper considers reinforcement learning control with the self-organizing map. Reinforcement learning uses the observable states of objective system and signals from interaction of the system and environment as input data. For fast learning in neural network training, it is necessary to reduce learning data. In this paper, we use the self-organizing map to partition the observable states. Partitioning states reduces the number of learning data which is used for training neural networks. And neural dynamic programming design method is used for the controller. For evaluating the designed reinforcement learning controller, an inverted pendulum on the cart system is simulated. The designed controller is composed of serial connection of self-organizing map and two Multi-layer Feed-Forward Neural Networks.

Dynamic Positioning of Robot Soccer Simulation Game Agents using Reinforcement learning

  • Kwon, Ki-Duk;Cho, Soo-Sin;Kim, In-Cheol
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2001년도 The Pacific Aisan Confrence On Intelligent Systems 2001
    • /
    • pp.59-64
    • /
    • 2001
  • The robot soccer simulation game is a dynamic multi-agent environment. In this paper we suggest a new reinforcement learning approach to each agent's dynamic positioning in such dynamic environment. Reinforcement learning is the machine learning in which an agent learns from indirect, delayed reward an optimal policy to chose sequences of actions that produce the greatest cumulative reward. Therefore the reinforcement learning is different from supervised learning in the sense that there is no presentation of input pairs as training examples. Furthermore, model-free reinforcement learning algorithms like Q-learning do not require defining or learning any models of the surrounding environment. Nevertheless it can learn the optimal policy if the agent can visit every state- action pair infinitely. However, the biggest problem of monolithic reinforcement learning is that its straightforward applications do not successfully scale up to more complex environments due to the intractable large space of states. In order to address this problem. we suggest Adaptive Mediation-based Modular Q-Learning (AMMQL)as an improvement of the existing Modular Q-Learning (MQL). While simple modular Q-learning combines the results from each learning module in a fixed way, AMMQL combines them in a more flexible way by assigning different weight to each module according to its contribution to rewards. Therefore in addition to resolving the problem of large state effectively, AMMQL can show higher adaptability to environmental changes than pure MQL. This paper introduces the concept of AMMQL and presents details of its application into dynamic positioning of robot soccer agents.

  • PDF

강화학습을 이용한 n-Queen 문제의 수렴속도 향상 (The Improvement of Convergence Rate in n-Queen Problem Using Reinforcement learning)

  • 임수연;손기준;박성배;이상조
    • 한국지능시스템학회논문지
    • /
    • 제15권1호
    • /
    • pp.1-5
    • /
    • 2005
  • 강화학습(Reinforcement-Learning)의 목적은 환경으로부터 주어지는 보상(reward)을 최대화하는 것이며, 강화학습 에이전트는 외부에 존재하는 환경과 시행착오를 통하여 상호작용하면서 학습한다 대표적인 강화학습 알고리즘인 Q-Learning은 시간 변화에 따른 적합도의 차이를 학습에 이용하는 TD-Learning의 한 종류로서 상태공간의 모든 상태-행동 쌍에 대한 평가 값을 반복 경험하여 최적의 전략을 얻는 방법이다. 본 논문에서는 강화학습을 적용하기 위한 예를 n-Queen 문제로 정하고, 문제풀이 알고리즘으로 Q-Learning을 사용하였다. n-Queen 문제를 해결하는 기존의 방법들과 제안한 방법을 비교 실험한 격과, 강화학습을 이용한 방법이 목표에 도달하기 위한 상태전이의 수를 줄여줌으로써 최적 해에 수련하는 속도가 더욱 빠름을 알 수 있었다.

Reinforcement Learning Control using Self-Organizing Map and Multi-layer Feed-Forward Neural Network

  • Lee, Jae-Kang;Kim, Il-Hwan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2003년도 ICCAS
    • /
    • pp.142-145
    • /
    • 2003
  • Many control applications using Neural Network need a priori information about the objective system. But it is impossible to get exact information about the objective system in real world. To solve this problem, several control methods were proposed. Reinforcement learning control using neural network is one of them. Basically reinforcement learning control doesn't need a priori information of objective system. This method uses reinforcement signal from interaction of objective system and environment and observable states of objective system as input data. But many methods take too much time to apply to real-world. So we focus on faster learning to apply reinforcement learning control to real-world. Two data types are used for reinforcement learning. One is reinforcement signal data. It has only two fixed scalar values that are assigned for each success and fail state. The other is observable state data. There are infinitive states in real-world system. So the number of observable state data is also infinitive. This requires too much learning time for applying to real-world. So we try to reduce the number of observable states by classification of states with Self-Organizing Map. We also use neural dynamic programming for controller design. An inverted pendulum on the cart system is simulated. Failure signal is used for reinforcement signal. The failure signal occurs when the pendulum angle or cart position deviate from the defined control range. The control objective is to maintain the balanced pole and centered cart. And four states that is, position and velocity of cart, angle and angular velocity of pole are used for state signal. Learning controller is composed of serial connection of Self-Organizing Map and two Multi-layer Feed-Forward Neural Networks.

  • PDF

Multiple Reward Reinforcement learning control of a mobile robot in home network environment

  • Kang, Dong-Oh;Lee, Jeun-Woo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2003년도 ICCAS
    • /
    • pp.1300-1304
    • /
    • 2003
  • The following paper deals with a control problem of a mobile robot in home network environment. The home network causes the mobile robot to communicate with sensors to get the sensor measurements and to be adapted to the environment changes. To get the improved performance of control of a mobile robot in spite of the change in home network environment, we use the fuzzy inference system with multiple reward reinforcement learning. The multiple reward reinforcement learning enables the mobile robot to consider the multiple control objectives and adapt itself to the change in home network environment. Multiple reward fuzzy Q-learning method is proposed for the multiple reward reinforcement learning. Multiple Q-values are considered and max-min optimization is applied to get the improved fuzzy rule. To show the effectiveness of the proposed method, some simulation results are given, which are performed in home network environment, i.e., LAN, wireless LAN, etc.

  • PDF

퍼지 추론에 의한 리커런트 뉴럴 네트워크 강화학습 (Fuzzy Inferdence-based Reinforcement Learning for Recurrent Neural Network)

  • 전효병;이동욱;김대준;심귀보
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1997년도 춘계학술대회 학술발표 논문집
    • /
    • pp.120-123
    • /
    • 1997
  • In this paper, we propose the Fuzzy Inference-based Reinforcement Learning Algorithm. We offer more similar learning scheme to the psychological learning of the higher animal's including human, by using Fuzzy Inference in Reinforcement Learning. The proposed method follows the way linguistic and conceptional expression have an effect on human's behavior by reasoning reinforcement based on fuzzy rule. The intervals of fuzzy membership functions are found optimally by genetic algorithms. And using Recurrent state is considered to make an action in dynamical environment. We show the validity of the proposed learning algorithm by applying to the inverted pendulum control problem.

  • PDF

Actor-Critic Reinforcement Learning System with Time-Varying Parameters

  • Obayashi, Masanao;Umesako, Kosuke;Oda, Tazusa;Kobayashi, Kunikazu;Kuremoto, Takashi
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2003년도 ICCAS
    • /
    • pp.138-141
    • /
    • 2003
  • Recently reinforcement learning has attracted attention of many researchers because of its simple and flexible learning ability for any environments. And so far many reinforcement learning methods have been proposed such as Q-learning, actor-critic, stochastic gradient ascent method and so on. The reinforcement learning system is able to adapt to changes of the environment because of the mutual action with it. However when the environment changes periodically, it is not able to adapt to its change well. In this paper we propose the reinforcement learning system that is able to adapt to periodical changes of the environment by introducing the time-varying parameters to be adjusted. It is shown that the proposed method works well through the simulation study of the maze problem with aisle that opens and closes periodically, although the conventional method with constant parameters to be adjusted does not works well in such environment.

  • PDF

가상 환경에서의 강화학습을 활용한 모바일 로봇의 장애물 회피 (Obstacle Avoidance of Mobile Robot Using Reinforcement Learning in Virtual Environment)

  • 이종락
    • 사물인터넷융복합논문지
    • /
    • 제7권4호
    • /
    • pp.29-34
    • /
    • 2021
  • 실 환경에서 로봇에 강화학습을 적용하기 위해서는 수많은 반복 학습이 필요하므로 가상 환경에서의 시뮬레이션을 사용할 수밖에 없다. 또한 실제 사용하는 로봇이 저사양의 하드웨어를 가지고 있는 경우 계산량이 많은 학습 알고리즘을 적용하는 것은 어려운 일이다. 본 연구에서는 저사양의 하드웨어를 가지고 있는 모바일 로봇의 장애물 충돌 회피 문제에 강화학습을 적용하기 위하여 가상의 시뮬레이션 환경으로서 Unity에서 제공하는 강화학습 프레임인 ML-Agent를 활용하였다. 강화학습 알고리즘으로서 ML-Agent에서 제공하는 DQN을 사용하였으며, 이를 활용하여 학습한 결과를 실제 로봇에 적용해 본 결과 1분간 충돌 횟수가 2회 이하로 발생하는 결과를 얻을 수 있었다.

커리큘럼을 이용한 투서클 기반 항공기 헤드온 공중 교전 강화학습 기법 연구 (Two Circle-based Aircraft Head-on Reinforcement Learning Technique using Curriculum)

  • 황인수;배정호
    • 한국군사과학기술학회지
    • /
    • 제26권4호
    • /
    • pp.352-360
    • /
    • 2023
  • Recently, AI pilots using reinforcement learning are developing to a level that is more flexible than rule-based methods and can replace human pilots. In this paper, a curriculum was used to help head-on combat with reinforcement learning. It is not easy to learn head-on with a reinforcement learning method without a curriculum, but in this paper, through the two circle-based head-on air combat learning technique, ownship gradually increase the difficulty and become good at head-on combat. On the two-circle, the ATA angle between the ownship and target gradually increased and the AA angle gradually decreased while learning was conducted. By performing reinforcement learning with and w/o curriculum, it was engaged with the rule-based model. And as the win ratio of the curriculum based model increased to close to 100 %, it was confirmed that the performance was superior.