• 제목/요약/키워드: Q-learning system

검색결과 142건 처리시간 0.032초

입출력 데이터 기반 Q-학습과 LMI를 이용한 선형 이산 시간 시스템의 모델-프리 $H_{\infty}$ 제어기 설계 (Model-free $H_{\infty}$ Control of Linear Discrete-time Systems using Q-learning and LMI Based on I/O Data)

  • 김진훈
    • 전기학회논문지
    • /
    • 제58권7호
    • /
    • pp.1411-1417
    • /
    • 2009
  • In this paper, we consider the design of $H_{\infty}$ control of linear discrete-time systems having no mathematical model. The basic approach is to use Q-learning which is a reinforcement learning method based on actor-critic structure. The model-free control design is to use not the mathematical model of the system but the informations on states and inputs. As a result, the derived iterative algorithm is expressed as linear matrix inequalities(LMI) of measured data from system states and inputs. It is shown that, for a sufficiently rich enough disturbance, this algorithm converges to the standard $H_{\infty}$ control solution obtained using the exact system model. A simple numerical example is given to show the usefulness of our result on practical application.

Multi-Dimensional Reinforcement Learning Using a Vector Q-Net - Application to Mobile Robots

  • Kiguchi, Kazuo;Nanayakkara, Thrishantha;Watanabe, Keigo;Fukuda, Toshio
    • International Journal of Control, Automation, and Systems
    • /
    • 제1권1호
    • /
    • pp.142-148
    • /
    • 2003
  • Reinforcement learning is considered as an important tool for robotic learning in unknown/uncertain environments. In this paper, we propose an evaluation function expressed in a vector form to realize multi-dimensional reinforcement learning. The novel feature of the proposed method is that learning one behavior induces parallel learning of other behaviors though the objectives of each behavior are different. In brief, all behaviors watch other behaviors from a critical point of view. Therefore, in the proposed method, there is cross-criticism and parallel learning that make the multi-dimensional learning process more efficient. By ap-plying the proposed learning method, we carried out multi-dimensional evaluation (reward) and multi-dimensional learning simultaneously in one trial. A special neural network (Q-net), in which the weights and the output are represented by vectors, is proposed to realize a critic net-work for Q-learning. The proposed learning method is applied for behavior planning of mobile robots.

Priority-based learning automata in Q-learning random access scheme for cellular M2M communications

  • Shinkafi, Nasir A.;Bello, Lawal M.;Shu'aibu, Dahiru S.;Mitchell, Paul D.
    • ETRI Journal
    • /
    • 제43권5호
    • /
    • pp.787-798
    • /
    • 2021
  • This paper applies learning automata to improve the performance of a Q-learning based random access channel (QL-RACH) scheme in a cellular machine-to-machine (M2M) communication system. A prioritized learning automata QL-RACH (PLA-QL-RACH) access scheme is proposed. The scheme employs a prioritized learning automata technique to improve the throughput performance by minimizing the level of interaction and collision of M2M devices with human-to-human devices sharing the RACH of a cellular system. In addition, this scheme eliminates the excessive punishment suffered by the M2M devices by controlling the administration of a penalty. Simulation results show that the proposed PLA-QL-RACH scheme improves the RACH throughput by approximately 82% and reduces access delay by 79% with faster learning convergence when compared with QL-RACH.

12각형 기반의 Q-learning과 SVM을 이용한 군집로봇의 목표물 추적 알고리즘 (Object tracking algorithm of Swarm Robot System for using SVM and Dodecagon based Q-learning)

  • 서상욱;양현창;심귀보
    • 한국지능시스템학회논문지
    • /
    • 제18권3호
    • /
    • pp.291-296
    • /
    • 2008
  • 본 논문에서는 군집로봇시스템에서 목표물 추적을 위하여 SVM을 이용한 12각형 기반의 Q-learning 알고리즘을 제안한다. 제안한 알고리즘의 유효성을 보이기 위해 본 논문에서는 여러 대의 로봇과 장애물 그리고 하나의 목표물로 정하고, 각각의 로봇이 숨겨진 목표물을 찾아내는 실험을 가정하여 무작위, DBAM과 AMAB의 융합 모델, 마지막으로는 본 논문에서 제안한 SVM과 12각형 기반의 Q-learning 알고리즘을 이용하여 실험을 수행하고, 이 3가지 방법을 비교하여 본 논문의 유효성을 검증하였다.

Visual Analysis of Deep Q-network

  • Seng, Dewen;Zhang, Jiaming;Shi, Xiaoying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권3호
    • /
    • pp.853-873
    • /
    • 2021
  • In recent years, deep reinforcement learning (DRL) models are enjoying great interest as their success in a variety of challenging tasks. Deep Q-Network (DQN) is a widely used deep reinforcement learning model, which trains an intelligent agent that executes optimal actions while interacting with an environment. This model is well known for its ability to surpass skilled human players across many Atari 2600 games. Although DQN has achieved excellent performance in practice, there lacks a clear understanding of why the model works. In this paper, we present a visual analytics system for understanding deep Q-network in a non-blind matter. Based on the stored data generated from the training and testing process, four coordinated views are designed to expose the internal execution mechanism of DQN from different perspectives. We report the system performance and demonstrate its effectiveness through two case studies. By using our system, users can learn the relationship between states and Q-values, the function of convolutional layers, the strategies learned by DQN and the rationality of decisions made by the agent.

다각형 기반의 Q-Learning과 Cascade SVM을 이용한 군집로봇의 목표물 추적 알고리즘 (Object Tracking Algorithm of Swarm Robot System for using Polygon Based Q-Learning and Cascade SVM)

  • 서상욱;양현창;심귀보
    • 대한임베디드공학회논문지
    • /
    • 제3권2호
    • /
    • pp.119-125
    • /
    • 2008
  • This paper presents the polygon-based Q-leaning and Cascade Support Vector Machine algorithm for object search with multiple robots. We organized an experimental environment with ten mobile robots, twenty five obstacles, and an object, and then we sent the robots to a hallway, where some obstacles were lying about, to search for a hidden object. In experiment, we used four different control methods: a random search, a fusion model with Distance-based action making (DBAM) and Area-based action making (ABAM) process to determine the next action of the robots, and hexagon-based Q-learning and dodecagon-based Q-learning and Cascade SVM to enhance the fusion model with DBAM and ABAM process.

  • PDF

Object tracking algorithm of Swarm Robot System for using Polygon based Q-learning and parallel SVM

  • Seo, Snag-Wook;Yang, Hyun-Chang;Sim, Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권3호
    • /
    • pp.220-224
    • /
    • 2008
  • This paper presents the polygon-based Q-leaning and Parallel SVM algorithm for object search with multiple robots. We organized an experimental environment with one hundred mobile robots, two hundred obstacles, and ten objects. Then we sent the robots to a hallway, where some obstacles were lying about, to search for a hidden object. In experiment, we used four different control methods: a random search, a fusion model with Distance-based action making (DBAM) and Area-based action making (ABAM) process to determine the next action of the robots, and hexagon-based Q-learning, and dodecagon-based Q-learning and parallel SVM algorithm to enhance the fusion model with Distance-based action making (DBAM) and Area-based action making (ABAM) process. In this paper, the result show that dodecagon-based Q-learning and parallel SVM algorithm is better than the other algorithm to tracking for object.

애드혹 센서 네트워크 수명 연장을 위한 Q-러닝 기반 에너지 균등 소비 라우팅 프로토콜 기법 (Equal Energy Consumption Routing Protocol Algorithm Based on Q-Learning for Extending the Lifespan of Ad-Hoc Sensor Network)

  • 김기상;김승욱
    • 정보처리학회논문지:컴퓨터 및 통신 시스템
    • /
    • 제10권10호
    • /
    • pp.269-276
    • /
    • 2021
  • 최근 스마트 센서는 다양한 환경에서 사용되고 있으며, 애드혹 센서 네트워크 (ASN) 구현에 대한 연구가 활발하게 진행되고 있다. 그러나 기존 센서 네트워크 라우팅 알고리즘은 특정 제어 문제에 초점을 맞추며 ASN 작업에 직접 적용할 수 없는 문제점이 있다. 본 논문에서는 Q-learning 기술을 이용한 새로운 라우팅 프로토콜을 제안하는데, 제안된 접근 방식의 주요 과제는 균형 잡힌 시스템 성능을 확보하면서 효율적인 에너지 할당을 통해 ASN의 수명을 연장하는 것이다. 제안된 방법의 특징은 다양한 환경적 요인을 고려하여 Q-learning 효과를 높이며, 특히 각 노드는 인접 노드의 Q 값을 자체 Q 테이블에 저장하여 데이터 전송이 실행될 때마다 Q 값이 업데이트되고 누적되어 최적의 라우팅 경로를 선택하는 것이다. 시뮬레이션 결과 제안된 방법이 에너지 효율적인 라우팅 경로를 선택할 수 있으며 기존 ASN 라우팅 프로토콜에 비해 우수한 네트워크 성능을 얻을 수 있음을 확인하였다.

Q-Learning을 ol용한 Intelligent Transportation System (Intelligent Transportation System using Q-Learning)

  • 박명수;김표재;최진영
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅲ
    • /
    • pp.1299-1302
    • /
    • 2003
  • In this paper, we propose new method which can provide user the path to the target place efficiently. It stores the state of roads to target place as the form of Q-table and finds the proper path using Q-table.0-table is updated by the information about real traffic which is reported by users. This method can provides the proper path, using less storage and less computation time than the conventional method which stores entire road traffic information and finds the path by graph search algorithm.

  • PDF

강화학습법을 이용한 유역통합 저수지군 운영 (Basin-Wide Multi-Reservoir Operation Using Reinforcement Learning)

  • 이진희;심명필
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2006년도 학술발표회 논문집
    • /
    • pp.354-359
    • /
    • 2006
  • The analysis of large-scale water resources systems is often complicated by the presence of multiple reservoirs and diversions, the uncertainty of unregulated inflows and demands, and conflicting objectives. Reinforcement learning is presented herein as a new approach to solving the challenging problem of stochastic optimization of multi-reservoir systems. The Q-Learning method, one of the reinforcement learning algorithms, is used for generating integrated monthly operation rules for the Keum River basin in Korea. The Q-Learning model is evaluated by comparing with implicit stochastic dynamic programming and sampling stochastic dynamic programming approaches. Evaluation of the stochastic basin-wide operational models considered several options relating to the choice of hydrologic state and discount factors as well as various stochastic dynamic programming models. The performance of Q-Learning model outperforms the other models in handling of uncertainty of inflows.

  • PDF