• Title/Summary/Keyword: Q-learning system

Search Result 141, Processing Time 0.034 seconds

A Performance Improvement Technique for Nash Q-learning using Macro-Actions (매크로 행동을 이용한 내시 Q-학습의 성능 향상 기법)

  • Sung, Yun-Sik;Cho, Kyun-Geun;Um, Ky-Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.3
    • /
    • pp.353-363
    • /
    • 2008
  • A multi-agent system has a longer learning period and larger state-spaces than a sin91e agent system. In this paper, we suggest a new method to reduce the learning time of Nash Q-learning in a multi-agent environment. We apply Macro-actions to Nash Q-learning to improve the teaming speed. In the Nash Q-teaming scheme, when agents select actions, rewards are accumulated like Macro-actions. In the experiments, we compare Nash Q-learning using Macro-actions with general Nash Q-learning. First, we observed how many times the agents achieve their goals. The results of this experiment show that agents using Nash Q-learning and 4 Macro-actions have 9.46% better performance than Nash Q-learning using only 4 primitive actions. Second, when agents use Macro-actions, Q-values are accumulated 2.6 times more. Finally, agents using Macro-actions select less actions about 44%. As a result, agents select fewer actions and Macro-actions improve the Q-value's update. It the agents' learning speeds improve.

  • PDF

Solving Continuous Action/State Problem in Q-Learning Using Extended Rule Based Fuzzy Inference System

  • Kim, Min-Soeng;Lee, Ju-Jang
    • Transactions on Control, Automation and Systems Engineering
    • /
    • v.3 no.3
    • /
    • pp.170-175
    • /
    • 2001
  • Q-learning is a kind of reinforcement learning where the agent solves the given task based on rewards received from the environment. Most research done in the field of Q-learning has focused on discrete domains, although the environment with which the agent must interact is generally continuous. Thus we need to devise some methods that enable Q-learning to be applicable to the continuous problem domain. In this paper, an extended fuzzy rule is proposed so that it can incorporate Q-learning. The interpolation technique, which is widely used in memory-based learning, is adopted to represent the appropriate Q value for current state and action pair in each extended fuzzy rule. The resulting structure based on the fuzzy inference system has the capability of solving the continuous state about the environment. The effectiveness of the proposed structure is shown through simulation on the cart-pole system.

  • PDF

A Simulation of Vehicle Parking Distribution System for Local Cultural Festival with Queuing Theory and Q-Learning Algorithm (대기행렬이론과 Q-러닝 알고리즘을 적용한 지역문화축제 진입차량 주차분산 시뮬레이션 시스템)

  • Cho, Youngho;Seo, Yeong Geon;Jeong, Dae-Yul
    • The Journal of Information Systems
    • /
    • v.29 no.2
    • /
    • pp.131-147
    • /
    • 2020
  • Purpose The purpose of this study is to develop intelligent vehicle parking distribution system based on LoRa network at the circumstance of traffic congestion during cultural festival in a local city. This paper proposes a parking dispatch and distribution system using a Q-learning algorithm to rapidly disperse traffics that increases suddenly because of in-bound traffics from the outside of a city in the real-time base as well as to increase parking probability in a parking lot which is widely located in a city. Design/methodology/approach The system get information on realtime-base from the sensor network of IoT (LoRa network). It will contribute to solve the sudden increase in traffic and parking bottlenecks during local cultural festival. We applied the simulation system with Queuing model to the Yudeung Festival in Jinju, Korea. We proposed a Q-learning algorithm that could change the learning policy by setting the acceptability value of each parking lot as a threshold from the Jinju highway IC (Interchange) to the 7 parking lots. LoRa Network platform supports to browse parking resource information to each vehicle in realtime. The system updates Q-table periodically using Q-learning algorithm as soon as get information from parking lots. The Queuing Theory with Poisson arrival distribution is used to get probability distribution function. The Dijkstra algorithm is used to find the shortest distance. Findings This paper suggest a simulation test to verify the efficiency of Q-learning algorithm at the circumstance of high traffic jam in a city during local festival. As a result of the simulation, the proposed algorithm performed well even when each parking lot was somewhat saturated. When an intelligent learning system such as an O-learning algorithm is applied, it is possible to more effectively distribute the vehicle to a lot with a high parking probability when the vehicle inflow from the outside rapidly increases at a specific time, such as a local city cultural festival.

Q-learning for intersection traffic flow Control based on agents

  • Zhou, Xuan;Chong, Kil-To
    • Proceedings of the IEEK Conference
    • /
    • 2009.05a
    • /
    • pp.94-96
    • /
    • 2009
  • In this paper, we present the Q-learning method for adaptive traffic signal control on the basis of multi-agent technology. The structure is composed of sixphase agents and one intersection agent. Wireless communication network provides the possibility of the cooperation of agents. As one kind of reinforcement learning, Q-learning is adopted as the algorithm of the control mechanism, which can acquire optical control strategies from delayed reward; furthermore, we adopt dynamic learning method instead of static method, which is more practical. Simulation result indicates that it is more effective than traditional signal system.

  • PDF

Design and Implementation of Parking Guidance System Based on Internet of Things(IoT) Using Q-learning Model (Q-learning 모델을 이용한 IoT 기반 주차유도 시스템의 설계 및 구현)

  • Ji, Yong-Joo;Choi, Hak-Hui;Kim, Dong-Seong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.3
    • /
    • pp.153-162
    • /
    • 2016
  • This paper proposes an optimal dynamic resource allocation method in IoT (Internet of Things) parking guidance system using Q-learning resource allocation model. In the proposed method, a resource allocation using a forecasting model based on Q-learning is employed for optimal utilization of parking guidance system. To demonstrate efficiency and availability of the proposed method, it is verified by computer simulation and practical testbed. Through simulation results, this paper proves that the proposed method can enhance total throughput, decrease penalty fee issued by SLA (Service Level Agreement) and reduce response time with the dynamic number of users.

Applying CEE (CrossEntropyError) to improve performance of Q-Learning algorithm (Q-learning 알고리즘이 성능 향상을 위한 CEE(CrossEntropyError)적용)

  • Kang, Hyun-Gu;Seo, Dong-Sung;Lee, Byeong-seok;Kang, Min-Soo
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.1-9
    • /
    • 2017
  • Recently, the Q-Learning algorithm, which is one kind of reinforcement learning, is mainly used to implement artificial intelligence system in combination with deep learning. Many research is going on to improve the performance of Q-Learning. Therefore, purpose of theory try to improve the performance of Q-Learning algorithm. This Theory apply Cross Entropy Error to the loss function of Q-Learning algorithm. Since the mean squared error used in Q-Learning is difficult to measure the exact error rate, the Cross Entropy Error, known to be highly accurate, is applied to the loss function. Experimental results show that the success rate of the Mean Squared Error used in the existing reinforcement learning was about 12% and the Cross Entropy Error used in the deep learning was about 36%. The success rate was shown.

Avoidance Behavior of Small Mobile Robots based on the Successive Q-Learning

  • Kim, Min-Soo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.164.1-164
    • /
    • 2001
  • Q-learning is a recent reinforcement learning algorithm that does not need a modeling of environment and it is a suitable approach to learn behaviors for autonomous agents. But when it is applied to multi-agent learning with many I/O states, it is usually too complex and slow. To overcome this problem in the multi-agent learning system, we propose the successive Q-learning algorithm. Successive Q-learning algorithm divides state-action pairs, which agents can have, into several Q-functions, so it can reduce complexity and calculation amounts. This algorithm is suitable for multi-agent learning in a dynamically changing environment. The proposed successive Q-learning algorithm is applied to the prey-predator problem with the one-prey and two-predators, and its effectiveness is verified from the efficient avoidance ability of the prey agent.

  • PDF

Solving the Gale-Shapley Problem by Ant-Q learning (Ant-Q 학습을 이용한 Gale-Shapley 문제 해결에 관한 연구)

  • Kim, Hyun;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.18B no.3
    • /
    • pp.165-172
    • /
    • 2011
  • In this paper, we propose Ant-Q learning Algorithm[1], which uses the habits of biological ants, to find a new way to solve Stable Marriage Problem(SMP)[3] presented by Gale-Shapley[2]. The issue of SMP is to find optimum matching for a stable marriage based on their preference lists (PL). The problem of Gale-Shapley algorithm is to get a stable matching for only male (or female). We propose other way to satisfy various requirements for SMP. ACS(Ant colony system) is an swarm intelligence method to find optimal solution by using phermone of ants. We try to improve ACS technique by adding Q learning[9] concept. This Ant-Q method can solve SMP problem for various requirements. The experiment results shows the proposed method is good for the problem.

Online Reinforcement Learning to Search the Shortest Path in Maze Environments (미로 환경에서 최단 경로 탐색을 위한 실시간 강화 학습)

  • Kim, Byeong-Cheon;Kim, Sam-Geun;Yun, Byeong-Ju
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.155-162
    • /
    • 2002
  • Reinforcement learning is a learning method that uses trial-and-error to perform Learning by interacting with dynamic environments. It is classified into online reinforcement learning and delayed reinforcement learning. In this paper, we propose an online reinforcement learning system (ONRELS : Outline REinforcement Learning System). ONRELS updates the estimate-value about all the selectable (state, action) pairs before making state-transition at the current state. The ONRELS learns by interacting with the compressed environments through trial-and-error after it compresses the state space of the mage environments. Through experiments, we can see that ONRELS can search the shortest path faster than Q-learning using TD-ewor and $Q(\lambda{)}$-learning using $TD(\lambda{)}$ in the maze environments.

Optimization of Stock Trading System based on Multi-Agent Q-Learning Framework (다중 에이전트 Q-학습 구조에 기반한 주식 매매 시스템의 최적화)

  • Kim, Yu-Seop;Lee, Jae-Won;Lee, Jong-Woo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.207-212
    • /
    • 2004
  • This paper presents a reinforcement learning framework for stock trading systems. Trading system parameters are optimized by Q-learning algorithm and neural networks are adopted for value approximation. In this framework, cooperative multiple agents are used to efficiently integrate global trend prediction and local trading strategy for obtaining better trading performance. Agents Communicate With Others Sharing training episodes and learned policies, while keeping the overall scheme of conventional Q-learning. Experimental results on KOSPI 200 show that a trading system based on the proposed framework outperforms the market average and makes appreciable profits. Furthermore, in view of risk management, the system is superior to a system trained by supervised learning.