• Title/Summary/Keyword: Q learning

Search Result 422, Processing Time 0.031 seconds

Path Planning of Unmanned Aerial Vehicle based Reinforcement Learning using Deep Q Network under Simulated Environment (시뮬레이션 환경에서의 DQN을 이용한 강화 학습 기반의 무인항공기 경로 계획)

  • Lee, Keun Hyoung;Kim, Shin Dug
    • Journal of the Semiconductor & Display Technology
    • /
    • v.16 no.3
    • /
    • pp.127-130
    • /
    • 2017
  • In this research, we present a path planning method for an autonomous flight of unmanned aerial vehicles (UAVs) through reinforcement learning under simulated environment. We design the simulator for reinforcement learning of uav. Also we implement interface for compatibility of Deep Q-Network(DQN) and simulator. In this paper, we perform reinforcement learning through the simulator and DQN, and use Q-learning algorithm, which is a kind of reinforcement learning algorithms. Through experimentation, we verify performance of DQN-simulator. Finally, we evaluated the learning results and suggest path planning strategy using reinforcement learning.

  • PDF

Optimization of Stock Trading System based on Multi-Agent Q-Learning Framework (다중 에이전트 Q-학습 구조에 기반한 주식 매매 시스템의 최적화)

  • Kim, Yu-Seop;Lee, Jae-Won;Lee, Jong-Woo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.207-212
    • /
    • 2004
  • This paper presents a reinforcement learning framework for stock trading systems. Trading system parameters are optimized by Q-learning algorithm and neural networks are adopted for value approximation. In this framework, cooperative multiple agents are used to efficiently integrate global trend prediction and local trading strategy for obtaining better trading performance. Agents Communicate With Others Sharing training episodes and learned policies, while keeping the overall scheme of conventional Q-learning. Experimental results on KOSPI 200 show that a trading system based on the proposed framework outperforms the market average and makes appreciable profits. Furthermore, in view of risk management, the system is superior to a system trained by supervised learning.

Multi-agent Q-learning based Admission Control Mechanism in Heterogeneous Wireless Networks for Multiple Services

  • Chen, Jiamei;Xu, Yubin;Ma, Lin;Wang, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.10
    • /
    • pp.2376-2394
    • /
    • 2013
  • In order to ensure both of the whole system capacity and users QoS requirements in heterogeneous wireless networks, admission control mechanism should be well designed. In this paper, Multi-agent Q-learning based Admission Control Mechanism (MQACM) is proposed to handle new and handoff call access problems appropriately. MQACM obtains the optimal decision policy by using an improved form of single-agent Q-learning method, Multi-agent Q-learning (MQ) method. MQ method is creatively introduced to solve the admission control problem in heterogeneous wireless networks in this paper. In addition, different priorities are allocated to multiple services aiming to make MQACM perform even well in congested network scenarios. It can be observed from both analysis and simulation results that our proposed method not only outperforms existing schemes with enhanced call blocking probability and handoff dropping probability performance, but also has better network universality and stability than other schemes.

Multiple Queue Packet Scheduling using Q-learning (큐러닝(Q-learning)을 이용한 다중 대기열 패킷 스케쥴링)

  • Jeong, Hyun-Seok;Lee, Tae-Ho;Lee, Byung-Jun;Kim, Kyoung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.205-206
    • /
    • 2018
  • 본 논문에서는 IoT 환경의 무선 센서 네트워크 시스템 상의 효율적인 패킷 전달을 위해 큐러닝(Q-learning)에 기반한 다중 대기열 동적 스케쥴링 기법을 제안한다. 이 정책은 다중 대기열(Multiple queue)의 각 큐가 요구하는 딜레이 조건에 맞춰 최대한 패킷 처리를 미룸으로써 효율적으로 CPU자원을 분배한다. 또한 각 노드들의 상태를 큐러닝(Q-learning)을 통해 지속적으로 상태를 파악하여 기아상태(Starvation)를 방지한다. 제안하는 기법은 무선 센서 네트워크 상의 가변적이고 예측 불가능한 환경에 대한 사전지식이 없이도 요구하는 서비스의 질(Quality of service)를 만족할 수 있도록 한다. 본 논문에서는 모의실험을 통해 기존의 학습 기반 패킷 스케쥴링 알고리즘과 비교하여 제안하는 스케쥴링 기법이 복잡한 요구조건에 따라 유연하고 공정한 서비스를 제공함에 있어 우수함을 증명하였다.

  • PDF

Generation of Ship's Optimal Route based on Q-Learning (Q-러닝 기반의 선박의 최적 경로 생성)

  • Hyeong-Tak Lee;Min-Kyu Kim;Hyun Yang
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.160-161
    • /
    • 2023
  • Currently, the ship's passage planning relies on the navigator officer's knowledge and empirical methods. However, as ship autonomous navigation technology has recently developed, automation technology for passage planning has been studied in various ways. In this study, we intend to generate an optimal route for a ship based on Q-learning, one of the reinforcement learning techniques. Reinforcement learning is applied in a way that trains experiences for various situations and makes optimal decisions based on them.

  • PDF

Reinforcement Learning Approach to Agents Dynamic Positioning in Robot Soccer Simulation Games

  • Kwon, Ki-Duk;Kim, In-Cheol
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2001.10a
    • /
    • pp.321-324
    • /
    • 2001
  • The robot soccer simulation game is a dynamic multi-agent environment. In this paper we suggest a new reinforcement learning approach to each agent's dynamic positioning in such dynamic environment. Reinforcement Beaming is the machine learning in which an agent learns from indirect, delayed reward an optimal policy to choose sequences of actions that produce the greatest cumulative reward. Therefore the reinforcement loaming is different from supervised teaming in the sense that there is no presentation of input-output pairs as training examples. Furthermore, model-free reinforcement loaming algorithms like Q-learning do not require defining or loaming any models of the surrounding environment. Nevertheless it can learn the optimal policy if the agent can visit every state-action pair infinitely. However, the biggest problem of monolithic reinforcement learning is that its straightforward applications do not successfully scale up to more complex environments due to the intractable large space of states. In order to address this problem, we suggest Adaptive Mediation-based Modular Q-Learning(AMMQL) as an improvement of the existing Modular Q-Learning(MQL). While simple modular Q-learning combines the results from each learning module in a fixed way, AMMQL combines them in a more flexible way by assigning different weight to each module according to its contribution to rewards. Therefore in addition to resolving the problem of large state space effectively, AMMQL can show higher adaptability to environmental changes than pure MQL. This paper introduces the concept of AMMQL and presents details of its application into dynamic positioning of robot soccer agents.

  • PDF

On-line Reinforcement Learning for Cart-pole Balancing Problem (카트-폴 균형 문제를 위한 실시간 강화 학습)

  • Kim, Byung-Chun;Lee, Chang-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.4
    • /
    • pp.157-162
    • /
    • 2010
  • The cart-pole balancing problem is a pseudo-standard benchmark problem from the field of control methods including genetic algorithms, artificial neural networks, and reinforcement learning. In this paper, we propose a novel approach by using online reinforcement learning(OREL) to solve this cart-pole balancing problem. The objective is to analyze the learning method of the OREL learning system in the cart-pole balancing problem. Through experiment, we can see that approximate faster the optimal value-function than Q-learning.

Online Evolution for Cooperative Behavior in Group Robot Systems

  • Lee, Dong-Wook;Seo, Sang-Wook;Sim, Kwee-Bo
    • International Journal of Control, Automation, and Systems
    • /
    • v.6 no.2
    • /
    • pp.282-287
    • /
    • 2008
  • In distributed mobile robot systems, autonomous robots accomplish complicated tasks through intelligent cooperation with each other. This paper presents behavior learning and online distributed evolution for cooperative behavior of a group of autonomous robots. Learning and evolution capabilities are essential for a group of autonomous robots to adapt to unstructured environments. Behavior learning finds an optimal state-action mapping of a robot for a given operating condition. In behavior learning, a Q-learning algorithm is modified to handle delayed rewards in the distributed robot systems. A group of robots implements cooperative behaviors through communication with other robots. Individual robots improve the state-action mapping through online evolution with the crossover operator based on the Q-values and their update frequencies. A cooperative material search problem demonstrated the effectiveness of the proposed behavior learning and online distributed evolution method for implementing cooperative behavior of a group of autonomous mobile robots.

Performance Enhancement of CSMA/CA MAC Protocol Based on Reinforcement Learning

  • Kim, Tae-Wook;Hwang, Gyung-Ho
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Reinforcement learning is an area of machine learning that studies how an intelligent agent takes actions in a given environment to maximize the cumulative reward. In this paper, we propose a new MAC protocol based on the Q-learning technique of reinforcement learning to improve the performance of the IEEE 802.11 wireless LAN CSMA/CA MAC protocol. Furthermore, the operation of each access point (AP) and station is proposed. The AP adjusts the value of the contention window (CW), which is the range for determining the backoff number of the station, according to the wireless traffic load. The station improves the performance by selecting an optimal backoff number with the lowest packet collision rate and the highest transmission success rate through Q-learning within the CW value transmitted from the AP. The result of the performance evaluation through computer simulations showed that the proposed scheme has a higher throughput than that of the existing CSMA/CA scheme.

Neural-Q method based on KFD regression (KFD 회귀를 이용한 뉴럴-큐 기법)

  • 조원희;김영일;박주영
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.85-88
    • /
    • 2003
  • 강화학습의 한가지 방법인 Q-learning은 최근에 Linear Quadratic Regulation(이하 LQR) 문제에 성공적으로 적용된 바 있다. 특히, 시스템 모델의 파라미터에 대한 구체적인 정보없이 적절한 입ㆍ출력만으로 학습을 통해 문제의 해결이 가능하므로 상황에 따라 매우 실용적인 방법이 될 수 있다. 뉴럴-큐 기법은 이러한 Q-learning의 Q-value를 MLP(multilayer perceptron) 신경망의 출력으로 대치시켜, 비선형 시스템의 최적제어 문제를 다룰 수 있게 한 방법이다. 그러나, 뉴럴-큐 기법은 신경망의 구조를 먼저 결정한 후 역전파 알고리즘을 이용해 학습하는 절차를 행하므로, 시행착오를 통해 신경망 구조를 결정해야 한다는 점, 역전파 알고리즘의 적용에 따라 신경망의 연결강도 값들이 지역적 최적해로 수렴한다는 점등의 문제점이 있다. 본 논문에서는 뉴럴-큐 학습의 도구로 KFD회귀를 이용하여 Q 함수의 근사 기법을 제안하고 관련 수식을 유도하였다. 그리고, 모의 실험을 통하여, 제안된 뉴럴-큐 방법의 적용 가능성을 알아보았다.

  • PDF