• Title/Summary/Keyword: CRITIC method

Search Result 38, Processing Time 0.022 seconds

Design of Multiobjective Satisfactory Fuzzy Logic Controller using Reinforcement Learning

  • Kang, Dong-Oh;Zeungnam Bien
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.677-680
    • /
    • 2000
  • The technique of reinforcement learning algorithm is extended to solve the multiobjective control problem for uncertain dynamic systems. A multiobjective adaptive critic structure is proposed in order to realize a max-min method in the reinforcement learning process. Also, the proposed reinforcement learning technique is applied to a multiobjective satisfactory fuzzy logic controller design in which fuzzy logic subcontrollers are assumed to be derived from human experts. Some simulation results are given in order to show effectiveness of the proposed method.

  • PDF

Automated Areal Feature Matching in Different Spatial Data-sets (이종의 공간 데이터 셋의 면 객체 자동 매칭 방법)

  • Kim, Ji Young;Lee, Jae Bin
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.1
    • /
    • pp.89-98
    • /
    • 2016
  • In this paper, we proposed an automated areal feature matching method based on geometric similarity without user intervention and is applied into areal features of many-to-many relation, for confusion of spatial data-sets of different scale and updating cycle. Firstly, areal feature(node) that a value of inclusion function is more than 0.4 was connected as an edge in adjacency matrix and candidate corresponding areal features included many-to-many relation was identified by multiplication of adjacency matrix. For geometrical matching, these multiple candidates corresponding areal features were transformed into an aggregated polygon as a convex hull generated by a curve-fitting algorithm. Secondly, we defined matching criteria to measure geometrical quality, and these criteria were changed into normalized values, similarity, by similarity function. Next, shape similarity is defined as a weighted linear combination of these similarities and weights which are calculated by Criteria Importance Through Intercriteria Correlation(CRITIC) method. Finally, in training data, we identified Equal Error Rate(EER) which is trade-off value in a plot of precision versus recall for all threshold values(PR curve) as a threshold and decided if these candidate pairs are corresponding pairs or not. To the result of applying the proposed method in a digital topographic map and a base map of address system(KAIS), we confirmed that some many-to-many areal features were mis-detected in visual evaluation and precision, recall and F-Measure was highly 0.951, 0.906, 0.928, respectively in statistical evaluation. These means that accuracy of the automated matching between different spatial data-sets by the proposed method is highly. However, we should do a research on an inclusion function and a detail matching criterion to exactly quantify many-to-many areal features in future.

Comparison of Reinforcement Learning Algorithms for a 2D Racing Game Learning Agent (2D 레이싱 게임 학습 에이전트를 위한 강화 학습 알고리즘 비교 분석)

  • Lee, Dongcheul
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.171-176
    • /
    • 2020
  • Reinforcement learning is a well-known method for training an artificial software agent for a video game. Even though many reinforcement learning algorithms have been proposed, their performance was varies depending on an application area. This paper compares the performance of the algorithms when we train our reinforcement learning agent for a 2D racing game. We defined performance metrics to analyze the results and plotted them into various graphs. As a result, we found ACER (Actor Critic with Experience Replay) achieved the best rewards than other algorithms. There was 157% gap between ACER and the worst algorithm.

Robot Control via RPO-based Reinforcement Learning Algorithm (RPO 기반 강화학습 알고리즘을 이용한 로봇제어)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.505-510
    • /
    • 2005
  • The RPO(randomized policy optimizer) algorithm, which utilizes probabilistic policy for the action selection, is a recently developed tool in the area of reinforcement learning, and has been shown to be very successful in several application problems. In this paper, we propose a modified RPO algorithm, whose critic network is adapted via RLS(Recursive Least Square) algorithm. In order to illustrate the applicability of the modified RPO method, we applied the modified algorithm to Kimura's robot and observed very good performance. We also developed a MATLAB-based animation program, by which the effectiveness of the training algorithms on the acceleration or the robot movement were observed.

Strategy to coordinate actions through a plant parameter prediction model during startup operation of a nuclear power plant

  • Jae Min Kim;Junyong Bae;Seung Jun Lee
    • Nuclear Engineering and Technology
    • /
    • v.55 no.3
    • /
    • pp.839-849
    • /
    • 2023
  • The development of automation technology to reduce human error by minimizing human intervention is accelerating with artificial intelligence and big data processing technology, even in the nuclear field. Among nuclear power plant operation modes, the startup and shutdown operations are still performed manually and thus have the potential for human error. As part of the development of an autonomous operation system for startup operation, this paper proposes an action coordinating strategy to obtain the optimal actions. The lower level of the system consists of operating blocks that are created by analyzing the operation tasks to achieve local goals through soft actor-critic algorithms. However, when multiple agents try to perform conflicting actions, a method is needed to coordinate them, and for this, an action coordination strategy was developed in this work as the upper level of the system. Three quantification methods were compared and evaluated based on the future plant state predicted by plant parameter prediction models using long short-term memory networks. Results confirmed that the optimal action to satisfy the limiting conditions for operation can be selected by coordinating the action sets. It is expected that this methodology can be generalized through future research.

Graph Neural Network and Reinforcement Learning based Optimal VNE Method in 5G and B5G Networks (5G 및 B5G 네트워크에서 그래프 신경망 및 강화학습 기반 최적의 VNE 기법)

  • Seok-Woo Park;Kang-Hyun Moon;Kyung-Taek Chung;In-Ho Ra
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.113-124
    • /
    • 2023
  • With the advent of 5G and B5G (Beyond 5G) networks, network virtualization technology that can overcome the limitations of existing networks is attracting attention. The purpose of network virtualization is to provide solutions for efficient network resource utilization and various services. Existing heuristic-based VNE (Virtual Network Embedding) techniques have been studied, but the flexibility is limited. Therefore, in this paper, we propose a GNN-based network slicing classification scheme to meet various service requirements and a RL-based VNE scheme for optimal resource allocation. The proposed method performs optimal VNE using an Actor-Critic network. Finally, to evaluate the performance of the proposed technique, we compare it with Node Rank, MCST-VNE, and GCN-VNE techniques. Through performance analysis, it was shown that the GNN and RL-based VNE techniques are better than the existing techniques in terms of acceptance rate and resource efficiency.

A Study of the Method of Expression of Surrealism in the Modern Costume (1) (현대복식에 응용된 초현실주의적 표현방법 고찰(I) -1989~1994년 복식을 중심으로-)

  • 곽미영;정흥숙
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.19 no.2
    • /
    • pp.380-392
    • /
    • 1995
  • The purpose of this study was to investigate a comparison between method of depaysement of Surrealism and modern costume. Surrealism was based on Freud's theory of uncon\ulcornersciousness and Hegelian dialectic. I found that its method of expression and inspiration have a continuous influence on a field of fashion through preceding study. Surrealism stimulated Elsa Schiaparelli (1890-1973) to make creative and innovative costume which created a sensatio.i1 the world of fashion in 1930. And it have been influence on modern fashion. From this point of view, I examined surrealistic painter of Rene Magritte (1898-1967) to use shocking method of depaysement through literature and photographes. And I made researches on Paris London collcetions from 1989 to 1994 in the cause of analysis a comparison with depaysement in painting of R. Magritte. As the result of analysising main works of R.Magritte according classification of Suzi Gabric (an art critic), he was expressive of usual object in various of depaysement. I also proved that modern fashion which was new shocking, innovative and avant-garde presented unconsciousness through these expression of depaysement with common subject. In consequence, the method of expression of surrealism have been a durable influence on modern costume.

  • PDF

Multi-Dimensional Reinforcement Learning Using a Vector Q-Net - Application to Mobile Robots

  • Kiguchi, Kazuo;Nanayakkara, Thrishantha;Watanabe, Keigo;Fukuda, Toshio
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.1
    • /
    • pp.142-148
    • /
    • 2003
  • Reinforcement learning is considered as an important tool for robotic learning in unknown/uncertain environments. In this paper, we propose an evaluation function expressed in a vector form to realize multi-dimensional reinforcement learning. The novel feature of the proposed method is that learning one behavior induces parallel learning of other behaviors though the objectives of each behavior are different. In brief, all behaviors watch other behaviors from a critical point of view. Therefore, in the proposed method, there is cross-criticism and parallel learning that make the multi-dimensional learning process more efficient. By ap-plying the proposed learning method, we carried out multi-dimensional evaluation (reward) and multi-dimensional learning simultaneously in one trial. A special neural network (Q-net), in which the weights and the output are represented by vectors, is proposed to realize a critic net-work for Q-learning. The proposed learning method is applied for behavior planning of mobile robots.

Blockchain Based Financial Portfolio Management Using A3C (A3C를 활용한 블록체인 기반 금융 자산 포트폴리오 관리)

  • Kim, Ju-Bong;Heo, Joo-Seong;Lim, Hyun-Kyo;Kwon, Do-Hyung;Han, Youn-Hee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.1
    • /
    • pp.17-28
    • /
    • 2019
  • In the financial investment management strategy, the distributed investment selecting and combining various financial assets is called portfolio management theory. In recent years, the blockchain based financial assets, such as cryptocurrencies, have been traded on several well-known exchanges, and an efficient portfolio management approach is required in order for investors to steadily raise their return on investment in cryptocurrencies. On the other hand, deep learning has shown remarkable results in various fields, and research on application of deep reinforcement learning algorithm to portfolio management has begun. In this paper, we propose an efficient financial portfolio investment management method based on Asynchronous Advantage Actor-Critic (A3C), which is a representative asynchronous reinforcement learning algorithm. In addition, since the conventional cross-entropy function can not be applied to portfolio management, we propose a proper method where the existing cross-entropy is modified to fit the portfolio investment method. Finally, we compare the proposed A3C model with the existing reinforcement learning based cryptography portfolio investment algorithm, and prove that the performance of the proposed A3C model is better than the existing one.

Model-free $H_{\infty}$ Control of Linear Discrete-time Systems using Q-learning and LMI Based on I/O Data (입출력 데이터 기반 Q-학습과 LMI를 이용한 선형 이산 시간 시스템의 모델-프리 $H_{\infty}$ 제어기 설계)

  • Kim, Jin-Hoon;Lewis, F.L.
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.7
    • /
    • pp.1411-1417
    • /
    • 2009
  • In this paper, we consider the design of $H_{\infty}$ control of linear discrete-time systems having no mathematical model. The basic approach is to use Q-learning which is a reinforcement learning method based on actor-critic structure. The model-free control design is to use not the mathematical model of the system but the informations on states and inputs. As a result, the derived iterative algorithm is expressed as linear matrix inequalities(LMI) of measured data from system states and inputs. It is shown that, for a sufficiently rich enough disturbance, this algorithm converges to the standard $H_{\infty}$ control solution obtained using the exact system model. A simple numerical example is given to show the usefulness of our result on practical application.