• Title/Summary/Keyword: Learning Agent

Search Result 448, Processing Time 0.03 seconds

Cooperative Multi-Agent Reinforcement Learning-Based Behavior Control of Grid Sortation Systems in Smart Factory (스마트 팩토리에서 그리드 분류 시스템의 협력적 다중 에이전트 강화 학습 기반 행동 제어)

  • Choi, HoBin;Kim, JuBong;Hwang, GyuYoung;Kim, KwiHoon;Hong, YongGeun;Han, YounHee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.8
    • /
    • pp.171-180
    • /
    • 2020
  • Smart Factory consists of digital automation solutions throughout the production process, including design, development, manufacturing and distribution, and it is an intelligent factory that installs IoT in its internal facilities and machines to collect process data in real time and analyze them so that it can control itself. The smart factory's equipment works in a physical combination of numerous hardware, rather than a virtual character being driven by a single object, such as a game. In other words, for a specific common goal, multiple devices must perform individual actions simultaneously. By taking advantage of the smart factory, which can collect process data in real time, if reinforcement learning is used instead of general machine learning, behavior control can be performed without the required training data. However, in the real world, it is impossible to learn more than tens of millions of iterations due to physical wear and time. Thus, this paper uses simulators to develop grid sortation systems focusing on transport facilities, one of the complex environments in smart factory field, and design cooperative multi-agent-based reinforcement learning to demonstrate efficient behavior control.

A Study on Machine Learning and Basic Algorithms (기계학습 및 기본 알고리즘 연구)

  • Kim, Dong-Hyun;Lee, Tae-ho;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.35-36
    • /
    • 2018
  • 본 논문에서는 기계학습 및 기계학습 기법 중에서도 Markov Decision Process (MDP)를 기반으로 하는 강화학습에 대해 알아보고자 한다. 강화학습은 기계학습의 일종으로 주어진 환경 안에서 의사결정자(Agent)는 현재의 상태를 인식하고 가능한 행동 집합 중에서 보상을 극대화할 수 있는 행동을 선택하는 방법이다. 일반적인 기계학습과는 달리 강화학습은 학습에 필요한 사전 지식을 요구하지 않기 때문에 불명확한 환경 속에서도 반복 학습이 가능하다. 본 연구에서는 일반적인 강화학습 및 강화학습 중에서 가장 많이 사용되고 있는 Q-learning 에 대해 간략히 설명한다.

  • PDF

A Study on the U-learning Service Application Based on the Context Awareness (상황인지기반 U-Learning 응용서비스)

  • Lee, Kee-O;Lee, Hyun-Chang;Shin, Hyun-Cheul
    • Convergence Security Journal
    • /
    • v.8 no.4
    • /
    • pp.81-89
    • /
    • 2008
  • This paper introduces u-learning service model based on context awareness. Also, it concentrates on agent-based WPAN technology, OSGi based middleware design, and the application mechanism such as context manager/profile manager provided by agents/server. Especially, we'll introduce the meta structure and its management algorithm, which can be updated with learning experience dynamically. So, we can provide learner with personalized profile and dynamic context for seamless learning service. The OSGi middleware is applied to our meta structure as a conceptual infrastructure.

  • PDF

A Study on Deep Reinforcement Learning Framework for DME Pulse Design

  • Lee, Jungyeon;Kim, Euiho
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.10 no.2
    • /
    • pp.113-120
    • /
    • 2021
  • The Distance Measuring Equipment (DME) is a ground-based aircraft navigation system and is considered as an infrastructure that ensures resilient aircraft navigation capability during the event of a Global Navigation Satellite System (GNSS) outage. The main problem of DME as a GNSS back up is a poor positioning accuracy that often reaches over 100 m. In this paper, a novel approach of applying deep reinforcement learning to a DME pulse design is introduced to improve the DME distance measuring accuracy. This method is designed to develop multipath-resistant DME pulses that comply with current DME specifications. In the research, a Markov Decision Process (MDP) for DME pulse design is set using pulse shape requirements and a timing error. Based on the designed MDP, we created an Environment called PulseEnv, which allows the agent representing a DME pulse shape to explore continuous space using the Soft Actor Critical (SAC) reinforcement learning algorithm.

On the Reward Function of Latent SAC Reinforcement Learning to Improve Longitudinal Driving Performance (종방향 주행성능향상을 위한 Latent SAC 강화학습 보상함수 설계)

  • Jo, Sung-Bean;Jeong, Han-You
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.728-734
    • /
    • 2021
  • In recent years, there has been a strong interest in the end-to-end autonomous driving based on deep reinforcement learning. In this paper, we present a reward function of latent SAC deep reinforcement learning to improve the longitudinal driving performance of an agent vehicle. While the existing reward function significantly degrades the driving safety and efficiency, the proposed reward function is shown to maintain an appropriate headway distance while avoiding the front vehicle collision.

Implementation of Target Object Tracking Method using Unity ML-Agent Toolkit (Unity ML-Agents Toolkit을 활용한 대상 객체 추적 머신러닝 구현)

  • Han, Seok Ho;Lee, Yong-Hwan
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.3
    • /
    • pp.110-113
    • /
    • 2022
  • Non-playable game character plays an important role in improving the concentration of the game and the interest of the user, and recently implementation of NPC with reinforcement learning has been in the spotlight. In this paper, we estimate an AI target tracking method via reinforcement learning, and implement an AI-based tracking agency of specific target object with avoiding traps through Unity ML-Agents Toolkit. The implementation is built in Unity game engine, and simulations are conducted through a number of experiments. The experimental results show that outstanding performance of the tracking target with avoiding traps is shown with good enough results.

VA Design of Personalized e-Learning System for the Driver's License Test in Korea (개인 맞춤형 운전면허 학습시스템 설계)

  • Oh, Yong-Sun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.1055-1060
    • /
    • 2009
  • In this paper, we design an e-Learning system for the Driver's License Teste studying through the Internet. The proposed system make users to be arrived at the goal for the license in a shorter time by offering learning contents and items according to the item-responses made by the users based on the Item Response Theory. Moreover we design the scheme to give the optimum items and the most necessary content to the user during the learning procedure in the form of concept-based objects. All the items in the problem bank DB maintain their difficulties, discriminations, and guessing parameters as is the case of 3-parameter logistic model. In addition user profile DB stores users' status informations, item responses, and ability parameters. Using these structures and combining agents, we can offer the optimum learning process or dynamic personalized studying structure to the user. We can construct interface agent and content selection and feedback agent with the DB's described above. User can study without any awareness of system operations or personal fitting scheme.

  • PDF

Multi Colony Intensification.Diversification Interaction Ant Reinforcement Learning Using Temporal Difference Learning (Temporal Difference 학습을 이용한 다중 집단 강화.다양화 상호작용 개미 강화학습)

  • Lee Seung-Gwan
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.5
    • /
    • pp.1-9
    • /
    • 2005
  • In this paper, we suggest multi colony interaction ant reinforcement learning model. This method is a hybrid of multi colony interaction by elite strategy and reinforcement teaming applying Temporal Difference(TD) learning to Ant-Q loaming. Proposed model is consisted of some independent AS colonies, and interaction achieves search according to elite strategy(Intensification, Diversification strategy) between the colonies. Intensification strategy enables to select of good path to use heuristic information of other agent colony. This makes to select the high frequency of the visit of a edge by agents through positive interaction of between the colonies. Diversification strategy makes to escape selection of the high frequency of the visit of a edge by agents achieve negative interaction by search information of other agent colony. Through this strategies, we could know that proposed reinforcement loaming method converges faster to optimal solution than original ACS and Ant-Q.

  • PDF

Implementation of a Video Retrieval System Using Annotation and Comparison Area Learning of Key-Frames (키 프레임의 주석과 비교 영역 학습을 이용한 비디오 검색 시스템의 구현)

  • Lee Keun-Wang;Kim Hee-Sook;Lee Jong-Hee
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.2
    • /
    • pp.269-278
    • /
    • 2005
  • In order to process video data effectively, it is required that the content information of video data is loaded in database and semantics-based retrieval method can be available for various queries of users. In this paper, we propose a video retrieval system which support semantics retrieval of various users for massive video data by user's keywords and comparison area learning based on automatic agent. By user's fundamental query and selection of image for key frame that extracted from query, the agent gives the detail shape for annotation of extracted key frame. Also, key frame selected by user becomes a query image and searches the most similar key frame through color histogram comparison and comparison area learning method that proposed. From experiment, the designed and implemented system showed high precision ratio in performance assessment more than 93 percents.

  • PDF

Improvement of Learning Behavior of Mice by an Antiacetylcholinesterase and Neuroprotective Agent NX42, a Laminariales-Alga Extract (Acetylcholinesterase 억제 및 신경세포 보호 활성을 갖는 다시마목 해조 추출물 NX42의 마우스 학습능력 향상 효과)

  • Lee, Bong-Ho;Stein, Steven M.
    • Korean Journal of Food Science and Technology
    • /
    • v.36 no.6
    • /
    • pp.974-978
    • /
    • 2004
  • Brown-alga-derived natural agent NX42, mainly composed of algal polysaccharides and phlorotannins, showed mild but dose-dependent inhibition of acetylcholinesterase with $IC_{50}=600-700\;{\mu}g/mL$. Phlorotannin-rich fraction of NX42 showed substantial increase of the activity by more than one order of magnitude ($IC_{50}=54\;{\mu}g/mL$) and significant protection of SK-N-SH cells from oxidative stress by $H_2O_2$. Learning trials of mice for 5 consecutive days revealed electric-shock treatment during learning period significantly retarded learning process, whereas NX42-treated mice showed significant resistance against leaning deficiency possibly mainly due to anticholinesterase and neuroprotective activities of phlorotannin.