• Title/Summary/Keyword: Meta reinforcement learning

Search Result 11, Processing Time 0.025 seconds

Control for Manipulator of an Underwater Robot Using Meta Reinforcement Learning (메타강화학습을 이용한 수중로봇 매니퓰레이터 제어)

  • Moon, Ji-Youn;Moon, Jang-Hyuk;Bae, Sung-Hoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.1
    • /
    • pp.95-100
    • /
    • 2021
  • This paper introduces model-based meta reinforcement learning as a control for the manipulator of an underwater construction robot. Model-based meta reinforcement learning updates the model fast using recent experience in a real application and transfers the model to model predictive control which computes control inputs of the manipulator to reach the target position. The simulation environment for model-based meta reinforcement learning is established using MuJoCo and Gazebo. The real environment of manipulator control for underwater construction robot is set to deal with model uncertainties.

Locomotion of Crawling Robots Based on Reinforcement Learning and Meta-Learning (강화학습 기법과 메타학습을 이용한 기는 로봇의 이동)

  • Mun, Yeong-Jun;Jeong, Gyu-Baek;Park, Ju-Yeong
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.395-398
    • /
    • 2007
  • 최근 인공지능 분야에서는 강화학습(Reinforcement Learning)에 대한 관심이 크게 증폭되고 있으며, 여러 관련 분야에 적용되고 있다. 본 논문에서는 강화학습 기법 중 액터-크리틱 계열에 속하는 RLS-NAC 알고리즘을 활용하여 Kimura의 기는 로봇의 이동을 다룰 때에 중요 파라미터의 결정을 위하여 meta-learning 기법을 활용하는 방안에 고려한다.

  • PDF

A Reinforcement Loaming Method using TD-Error in Ant Colony System (개미 집단 시스템에서 TD-오류를 이용한 강화학습 기법)

  • Lee, Seung-Gwan;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.77-82
    • /
    • 2004
  • Reinforcement learning takes reward about selecting action when agent chooses some action and did state transition in Present state. this can be the important subject in reinforcement learning as temporal-credit assignment problems. In this paper, by new meta heuristic method to solve hard combinational optimization problem, examine Ant-Q learning method that is proposed to solve Traveling Salesman Problem (TSP) to approach that is based for population that use positive feedback as well as greedy search. And, suggest Ant-TD reinforcement learning method that apply state transition through diversification strategy to this method and TD-error. We can show through experiments that the reinforcement learning method proposed in this Paper can find out an optimal solution faster than other reinforcement learning method like ACS and Ant-Q learning.

Multicast Tree Generation using Meta Reinforcement Learning in SDN-based Smart Network Platforms

  • Chae, Jihun;Kim, Namgi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3138-3150
    • /
    • 2021
  • Multimedia services on the Internet are continuously increasing. Accordingly, the demand for a technology for efficiently delivering multimedia traffic is also constantly increasing. The multicast technique, that delivers the same content to several destinations, is constantly being developed. This technique delivers a content from a source to all destinations through the multicast tree. The multicast tree with low cost increases the utilization of network resources. However, the finding of the optimal multicast tree that has the minimum link costs is very difficult and its calculation complexity is the same as the complexity of the Steiner tree calculation which is NP-complete. Therefore, we need an effective way to obtain a multicast tree with low cost and less calculation time on SDN-based smart network platforms. In this paper, we propose a new multicast tree generation algorithm which produces a multicast tree using an agent trained by model-based meta reinforcement learning. Experiments verified that the proposed algorithm generated multicast trees in less time compared with existing approximation algorithms. It produced multicast trees with low cost in a dynamic network environment compared with the previous DQN-based algorithm.

A Dynamic Asset Allocation Method based on Reinforcement learning Exploiting Local Traders (지역 투자 정책을 이용한 강화학습 기반 동적 자산 할당 기법)

  • O Jangmin;Lee Jongwoo;Zhang Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.8
    • /
    • pp.693-703
    • /
    • 2005
  • Given the local traders with pattern-based multi-predictors of stock prices, we study a method of dynamic asset allocation to maximize the trading performance. To optimize the proportion of asset allocated to each recommendation of the predictors, we design an asset allocation strategy called meta policy in the reinforcement teaming framework. We utilize both the information of each predictor's recommendations and the ratio of the stock fund over the total asset to efficiently describe the state space. The experimental results on Korean stock market show that the trading system with the proposed meta policy outperforms other systems with fixed asset allocation methods. This means that reinforcement learning can bring synergy effects to the decision making problem through exploiting supervised-learned predictors.

Federated Deep Reinforcement Learning Based on Privacy Preserving for Industrial Internet of Things (산업용 사물 인터넷을 위한 프라이버시 보존 연합학습 기반 심층 강화학습 모델)

  • Chae-Rim Han;Sun-Jin Lee;Il-Gu Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.1055-1065
    • /
    • 2023
  • Recently, various studies using deep reinforcement learning (deep RL) technology have been conducted to solve complex problems using big data collected at industrial internet of things. Deep RL uses reinforcement learning"s trial-and-error algorithms and cumulative compensation functions to generate and learn its own data and quickly explore neural network structures and parameter decisions. However, studies so far have shown that the larger the size of the learning data is, the higher are the memory usage and search time, and the lower is the accuracy. In this study, model-agnostic learning for efficient federated deep RL was utilized to solve privacy invasion by increasing robustness as 55.9% and achieve 97.8% accuracy, an improvement of 5.5% compared with the comparative optimization-based meta learning models, and to reduce the delay time by 28.9% on average.

A Study about Additional Reinforcement in Local Updating and Global Updating for Efficient Path Search in Ant Colony System (Ant Colony System에서 효율적 경로 탐색을 위한 지역갱신과 전역갱신에서의 추가 강화에 관한 연구)

  • Lee, Seung-Gwan;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.237-242
    • /
    • 2003
  • Ant Colony System (ACS) Algorithm is new meta heuristic for hard combinatorial optimization problem. It is a population based approach that uses exploitation of positive feedback as well as greedy search. It was first proposed for tackling the well known Traveling Salesman Problem (TSP). In this paper, we introduce ACS of new method that adds reinforcement value for each edge that visit to Local/Global updating rule. and the performance results under various conditions are conducted, and the comparision between the original ACS and the proposed method is shown. It turns out that our proposed method can compete with tile original ACS in terms of solution quality and computation speed to these problem.

Using Reinforcement Learning Agent for Metaverse Game Testing (메타버스 게임 테스팅에 대한 강화학습 에이전트 활용)

  • Lee, Hakjin;Lee, Scott Uk-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.135-136
    • /
    • 2022
  • 메타버스란 가공, 추상을 의미하는 메타(Meta)와 현실을 의미하는 유니버스(Unvierse)의 합성어이며 가상의 3차원을 뜻한다. 현재 메타버스는 4차 산업혁명의 중요 트렌드로 지목되어지고 있으며 다양한 기업들의 투자 또한 지속적으로 증가하는 추세이다. 이중 게임은 메타버스에서 가장 많은 비중을 차지하고 있으나 아직까지 메타버스 게임의 신뢰도가 높지 않으며 현재 출시되었거나 앞으로 출시될 게임들에 대한 적합한 테스트 기법이 필요한 실정이다. 이에 본 논문에서는 메타버스를 기반으로 하는 게임 테스팅에 강화학습 에이전트를 활용하는 방안에 대해 제안한다.

  • PDF

Neuro-fuzzy optimisation to model the phenomenon of failure by punching of a slab-column connection without shear reinforcement

  • Hafidi, Mariam;Kharchi, Fattoum;Lefkir, Abdelouhab
    • Structural Engineering and Mechanics
    • /
    • v.47 no.5
    • /
    • pp.679-700
    • /
    • 2013
  • Two new predictive design methods are presented in this study. The first is a hybrid method, called neuro-fuzzy, based on neural networks with fuzzy learning. A total of 280 experimental datasets obtained from the literature concerning concentric punching shear tests of reinforced concrete slab-column connections without shear reinforcement were used to test the model (194 for experimentation and 86 for validation) and were endorsed by statistical validation criteria. The punching shear strength predicted by the neuro-fuzzy model was compared with those predicted by current models of punching shear, widely used in the design practice, such as ACI 318-08, SIA262 and CBA93. The neuro-fuzzy model showed high predictive accuracy of resistance to punching according to all of the relevant codes. A second, more user-friendly design method is presented based on a predictive linear regression model that supports all the geometric and material parameters involved in predicting punching shear. Despite its simplicity, this formulation showed accuracy equivalent to that of the neuro-fuzzy model.

Reinforcement Post-Processing and Feedback Algorithm for Optimal Combination in Bottom-Up Hierarchical Classification (상향식 계층분류의 최적화 된 병합을 위한 후처리분석과 피드백 알고리즘)

  • Choi, Yun-Jeong;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.139-148
    • /
    • 2010
  • This paper shows a reinforcement post-processing method and feedback algorithm for improvement of assigning method in classification. Especially, we focused on complex documents that are generally considered to be hard to classify. A basis factors in traditional classification system are training methodology, classification models and features of documents. The classification problem of the documents containing shared features and multiple meanings, should be deeply mined or analyzed than general formatted data. To address the problems of these document, we proposed a method to expand classification scheme using decision boundary detected automatically in our previous studies. The assigning method that a document simply decides to the top ranked category, is a main factor that we focus on. In this paper, we propose a post-processing method and feedback algorithm to analyze the relevance of ranked list. In experiments, we applied our post-processing method and one time feedback algorithm to complex documents. The experimental results show that our system does not need to change the classification algorithm itself to improve the accuracy and flexibility.