• Title/Summary/Keyword: Q-learning algorithm

Search Result 152, Processing Time 0.026 seconds

Area-Based Q-learning Algorithm to Search Target Object of Multiple Robots (다수 로봇의 목표물 탐색을 위한 Area-Based Q-learning 알고리즘)

  • Yoon, Han-Ul;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.406-411
    • /
    • 2005
  • In this paper, we present the area-based Q-learning to search a target object using multiple robot. To search the target in Markovian space, the robots should recognize their surrounding at where they are located and generate some rules to act upon by themselves. Under area-based Q-learning, a robot, first of all, obtains 6-distances from itself to environment by infrared sensor which are hexagonally allocated around itself. Second, it calculates 6-areas with those distances then take an action, i.e., turn and move toward where the widest space will be guaranteed. After the action is taken, the value of Q will be updated by relative formula at the state. We set up an experimental environment with five small mobile robots, obstacles, and a target object, and tried to search for a target object while navigating in a unknown hallway where some obstacles were placed. In the end of this paper, we presents the results of three algorithms - a random search, area-based action making (ABAM), and hexagonal area-based Q-teaming.

Fuzzy Q-learning using Distributed Eligibility (분포 기여도를 이용한 퍼지 Q-learning)

  • 정석일;이연정
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.5
    • /
    • pp.388-394
    • /
    • 2001
  • Reinforcement learning is a kind of unsupervised learning methods that an agent control rules from experiences acquired by interactions with environment. The eligibility is used to resolve the credit-assignment problem which is one of important problems in reinforcement learning, Conventional eligibilities such as the accumulating eligibility and the replacing eligibility are ineffective in use of rewards acquired in learning process, since on1y one executed action for a visited state is learned. In this paper, we propose a new eligibility, called the distributed eligibility, with which not only an executed action but also neighboring actions in a visited state are to be learned. The fuzzy Q-learning algorithm using the proposed eligibility is applied to a cart-pole balancing problem, which shows the superiority of the proposed method to conventional methods in terms of learning speed.

  • PDF

Reward Design of Reinforcement Learning for Development of Smart Control Algorithm (스마트 제어알고리즘 개발을 위한 강화학습 리워드 설계)

  • Kim, Hyun-Su;Yoon, Ki-Yong
    • Journal of Korean Association for Spatial Structures
    • /
    • v.22 no.2
    • /
    • pp.39-46
    • /
    • 2022
  • Recently, machine learning is widely used to solve optimization problems in various engineering fields. In this study, machine learning is applied to development of a control algorithm for a smart control device for reduction of seismic responses. For this purpose, Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm. A single degree of freedom (SDOF) structure with a smart tuned mass damper (TMD) was used as an example structure. A smart TMD system was composed of MR (magnetorheological) damper instead of passive damper. Reward design of reinforcement learning mainly affects the control performance of the smart TMD. Various hyper-parameters were investigated to optimize the control performance of DQN-based control algorithm. Usually, decrease of the time step for numerical simulation is desirable to increase the accuracy of simulation results. However, the numerical simulation results presented that decrease of the time step for reward calculation might decrease the control performance of DQN-based control algorithm. Therefore, a proper time step for reward calculation should be selected in a DQN training process.

Equal Energy Consumption Routing Protocol Algorithm Based on Q-Learning for Extending the Lifespan of Ad-Hoc Sensor Network (애드혹 센서 네트워크 수명 연장을 위한 Q-러닝 기반 에너지 균등 소비 라우팅 프로토콜 기법)

  • Kim, Ki Sang;Kim, Sung Wook
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.10
    • /
    • pp.269-276
    • /
    • 2021
  • Recently, smart sensors are used in various environments, and the implementation of ad-hoc sensor networks (ASNs) is a hot research topic. Unfortunately, traditional sensor network routing algorithms focus on specific control issues, and they can't be directly applied to the ASN operation. In this paper, we propose a new routing protocol by using the Q-learning technology, Main challenge of proposed approach is to extend the life of ASNs through efficient energy allocation while obtaining the balanced system performance. The proposed method enhances the Q-learning effect by considering various environmental factors. When a transmission fails, node penalty is accumulated to increase the successful communication probability. Especially, each node stores the Q value of the adjacent node in its own Q table. Every time a data transfer is executed, the Q values are updated and accumulated to learn to select the optimal routing route. Simulation results confirm that the proposed method can choose an energy-efficient routing path, and gets an excellent network performance compared with the existing ASN routing protocols.

Intelligent Transportation System using Q-Learning (Q-Learning을 ol용한 Intelligent Transportation System)

  • 박명수;김표재;최진영
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1299-1302
    • /
    • 2003
  • In this paper, we propose new method which can provide user the path to the target place efficiently. It stores the state of roads to target place as the form of Q-table and finds the proper path using Q-table.0-table is updated by the information about real traffic which is reported by users. This method can provides the proper path, using less storage and less computation time than the conventional method which stores entire road traffic information and finds the path by graph search algorithm.

  • PDF

A Strategy for improving Performance of Q-learning with Prediction Information (예측 정보를 이용한 Q-학습의 성능 개선 기법)

  • Lee, Choong-Hyeon;Um, Ky-Hyun;Cho, Kyung-Eun
    • Journal of Korea Game Society
    • /
    • v.7 no.4
    • /
    • pp.105-116
    • /
    • 2007
  • Nowadays, learning of agents gets more and more useful in game environments. But it takes a long learning time to produce satisfactory results in game. So, we need a good method to shorten the learning time. In this paper, we present a strategy for improving the learning performance of Q-learning with prediction information. It refers to the chosen action at each status in the Q-learning algorithm, It stores the referred value at the P-table of prediction module, and then it searches some values with high frequency at the table. The values are used to renew second compensation value from the Q-table. Our experiments show that our approach gets the efficiency improvement of average 9% after the middle point of learning experiments, and that the more actions in a status space, the higher performance.

  • PDF

Traffic Offloading in Two-Tier Multi-Mode Small Cell Networks over Unlicensed Bands: A Hierarchical Learning Framework

  • Sun, Youming;Shao, Hongxiang;Liu, Xin;Zhang, Jian;Qiu, Junfei;Xu, Yuhua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.11
    • /
    • pp.4291-4310
    • /
    • 2015
  • This paper investigates the traffic offloading over unlicensed bands for two-tier multi-mode small cell networks. We formulate this problem as a Stackelberg game and apply a hierarchical learning framework to jointly maximize the utilities of both macro base station (MBS) and small base stations (SBSs). During the learning process, the MBS behaves as a leader and the SBSs are followers. A pricing mechanism is adopt by MBS and the price information is broadcasted to all SBSs by MBS firstly, then each SBS competes with other SBSs and takes its best response strategies to appropriately allocate the traffic load in licensed and unlicensed band in the sequel, taking the traffic flow payment charged by MBS into consideration. Then, we present a hierarchical Q-learning algorithm (HQL) to discover the Stackelberg equilibrium. Additionally, if some extra information can be obtained via feedback, we propose an improved hierarchical Q-learning algorithm (IHQL) to speed up the SBSs' learning process. Last but not the least, the convergence performance of the proposed two algorithms is analyzed. Numerical experiments are presented to validate the proposed schemes and show the effectiveness.

Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning (강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현)

  • 박찬건;양성봉
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.672-680
    • /
    • 2003
  • A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.

Path Planning of Unmanned Aerial Vehicle based Reinforcement Learning using Deep Q Network under Simulated Environment (시뮬레이션 환경에서의 DQN을 이용한 강화 학습 기반의 무인항공기 경로 계획)

  • Lee, Keun Hyoung;Kim, Shin Dug
    • Journal of the Semiconductor & Display Technology
    • /
    • v.16 no.3
    • /
    • pp.127-130
    • /
    • 2017
  • In this research, we present a path planning method for an autonomous flight of unmanned aerial vehicles (UAVs) through reinforcement learning under simulated environment. We design the simulator for reinforcement learning of uav. Also we implement interface for compatibility of Deep Q-Network(DQN) and simulator. In this paper, we perform reinforcement learning through the simulator and DQN, and use Q-learning algorithm, which is a kind of reinforcement learning algorithms. Through experimentation, we verify performance of DQN-simulator. Finally, we evaluated the learning results and suggest path planning strategy using reinforcement learning.

  • PDF

A Research on Low-power Buffer Management Algorithm based on Deep Q-Learning approach for IoT Networks (IoT 네트워크에서의 심층 강화학습 기반 저전력 버퍼 관리 기법에 관한 연구)

  • Song, Taewon
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.4
    • /
    • pp.1-7
    • /
    • 2022
  • As the number of IoT devices increases, power management of the cluster head, which acts as a gateway between the cluster and sink nodes in the IoT network, becomes crucial. Particularly when the cluster head is a mobile wireless terminal, the power consumption of the IoT network must be minimized over its lifetime. In addition, the delay of information transmission in the IoT network is one of the primary metrics for rapid information collecting in the IoT network. In this paper, we propose a low-power buffer management algorithm that takes into account the information transmission delay in an IoT network. By forwarding or skipping received packets utilizing deep Q learning employed in deep reinforcement learning methods, the suggested method is able to reduce power consumption while decreasing transmission delay level. The proposed approach is demonstrated to reduce power consumption and to improve delay relative to the existing buffer management technique used as a comparison in slotted ALOHA protocol.