• Title/Summary/Keyword: Reinforcement Learning (RL)

Search Result 62, Processing Time 0.024 seconds

Federated Deep Reinforcement Learning Based on Privacy Preserving for Industrial Internet of Things (산업용 사물 인터넷을 위한 프라이버시 보존 연합학습 기반 심층 강화학습 모델)

  • Chae-Rim Han;Sun-Jin Lee;Il-Gu Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.1055-1065
    • /
    • 2023
  • Recently, various studies using deep reinforcement learning (deep RL) technology have been conducted to solve complex problems using big data collected at industrial internet of things. Deep RL uses reinforcement learning"s trial-and-error algorithms and cumulative compensation functions to generate and learn its own data and quickly explore neural network structures and parameter decisions. However, studies so far have shown that the larger the size of the learning data is, the higher are the memory usage and search time, and the lower is the accuracy. In this study, model-agnostic learning for efficient federated deep RL was utilized to solve privacy invasion by increasing robustness as 55.9% and achieve 97.8% accuracy, an improvement of 5.5% compared with the comparative optimization-based meta learning models, and to reduce the delay time by 28.9% on average.

Research Trends on Inverse Reinforcement Learning (역강화학습 기술 동향)

  • Lee, S.K.;Kim, D.W.;Jang, S.H.;Yang, S.I.
    • Electronics and Telecommunications Trends
    • /
    • v.34 no.6
    • /
    • pp.100-107
    • /
    • 2019
  • Recently, reinforcement learning (RL) has expanded from the research phase of the virtual simulation environment to a wide range of applications, such as autonomous driving, natural language processing, recommendation systems, and disease diagnosis. However, RL is less likely to be used in these complex real-world environments. In contrast, inverse reinforcement learning (IRL) can obtain optimal policies in various situations; furthermore, it can use expert demonstration data to achieve its target task. In particular, IRL is expected to be a key technology for artificial general intelligence research that can successfully perform human intellectual tasks. In this report, we briefly summarize various IRL techniques and research directions.

Leveraging Reinforcement Learning for Generating Construction Workers' Moving Path: Opportunities and Challenges

  • Kim, Minguk;Kim, Tae Wan
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.1085-1092
    • /
    • 2022
  • Travel distance is a parameter mainly used in the objective function of Construction Site Layout Planning (CSLP) automation models. To obtain travel distance, common approaches, such as linear distance, shortest-distance algorithm, visibility graph, and access road path, concentrate only on identifying the shortest path. However, humans do not necessarily follow one shortest path but can choose a safer and more comfortable path according to their situation within a reasonable range. Thus, paths generated by these approaches may be different from the actual paths of the workers, which may cause a decrease in the reliability of the optimized construction site layout. To solve this problem, this paper adopts reinforcement learning (RL) inspired by various concepts of cognitive science and behavioral psychology to generate a realistic path that mimics the decision-making and behavioral processes of wayfinding of workers on the construction site. To do so, in this paper, the collection of human wayfinding tendencies and the characteristics of the walking environment of construction sites are investigated and the importance of taking these into account in simulating the actual path of workers is emphasized. Furthermore, a simulation developed by mapping the identified tendencies to the reward design shows that the RL agent behaves like a real construction worker. Based on the research findings, some opportunities and challenges were proposed. This study contributes to simulating the potential path of workers based on deep RL, which can be utilized to calculate the travel distance of CSLP automation models, contributing to providing more reliable solutions.

  • PDF

Gain Tuning for SMCSPO of Robot Arm with Q-Learning (Q-Learning을 사용한 로봇팔의 SMCSPO 게인 튜닝)

  • Lee, JinHyeok;Kim, JaeHyung;Lee, MinCheol
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.2
    • /
    • pp.221-229
    • /
    • 2022
  • Sliding mode control (SMC) is a robust control method to control a robot arm with nonlinear properties. A high switching gain of SMC causes chattering problems, although the SMC allows the adequate control performance by giving high switching gain, without the exact robot model containing nonlinear and uncertainty terms. In order to solve this problem, SMC with sliding perturbation observer (SMCSPO) has been researched, where the method can reduce the chattering by compensating the perturbation, which is estimated by the observer, and then choosing a lower switching control gain of SMC. However, optimal gain tuning is necessary to get a better tracking performance and reducing a chattering. This paper proposes a method that the Q-learning automatically tunes the control gains of SMCSPO with an iterative operation. In this tuning method, the rewards of reinforcement learning (RL) are set minus tracking errors of states, and the action of RL is a change of control gain to maximize rewards whenever the iteration number of movements increases. The simple motion test for a 7-DOF robot arm was simulated in MATLAB program to prove this RL tuning algorithm. The simulation showed that this method can automatically tune the control gains for SMCSPO.

Real-time RL-based 5G Network Slicing Design and Traffic Model Distribution: Implementation for V2X and eMBB Services

  • WeiJian Zhou;Azharul Islam;KyungHi Chang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2573-2589
    • /
    • 2023
  • As 5G mobile systems carry multiple services and applications, numerous user, and application types with varying quality of service requirements inside a single physical network infrastructure are the primary problem in constructing 5G networks. Radio Access Network (RAN) slicing is introduced as a way to solve these challenges. This research focuses on optimizing RAN slices within a singular physical cell for vehicle-to-everything (V2X) and enhanced mobile broadband (eMBB) UEs, highlighting the importance of adept resource management and allocation for the evolving landscape of 5G services. We put forth two unique strategies: one being offline network slicing, also referred to as standard network slicing, and the other being Online reinforcement learning (RL) network slicing. Both strategies aim to maximize network efficiency by gathering network model characteristics and augmenting radio resources for eMBB and V2X UEs. When compared to traditional network slicing, RL network slicing shows greater performance in the allocation and utilization of UE resources. These steps are taken to adapt to fluctuating traffic loads using RL strategies, with the ultimate objective of bolstering the efficiency of generic 5G services.

Evolutionary Reinforcement Learning System with Time-Varying Parameters

  • Song, Se-Kyong;Choi, J.Y.;Sung, H.K.;Kwon, Dong-Soo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.78.5-78
    • /
    • 2002
  • We propose an evolutionary reinforcement learning (RL) system with time-varying parameters that can deal with a dynamic environment. The proposed system has three characteristics: 1) It can deal easily with a dynamic environment by using time-varying parameters; 2) The division of state space is acquired evolutionarily by genetic algorithm (GA); 3) One does not have to design the rules constructing an agent in advance. So far many RL systems have been proposed. These systems adjust constant or non time-varying parameters; by those systems it is difficult to realize appropriate behavior in complex and dynamic environment. Hence, we propose the RL system whose parameters can vary temporally. T...

  • PDF

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • v.54 no.9
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

Applying Deep Reinforcement Learning to Improve Throughput and Reduce Collision Rate in IEEE 802.11 Networks

  • Ke, Chih-Heng;Astuti, Lia
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.334-349
    • /
    • 2022
  • The effectiveness of Wi-Fi networks is greatly influenced by the optimization of contention window (CW) parameters. Unfortunately, the conventional approach employed by IEEE 802.11 wireless networks is not scalable enough to sustain consistent performance for the increasing number of stations. Yet, it is still the default when accessing channels for single-users of 802.11 transmissions. Recently, there has been a spike in attempts to enhance network performance using a machine learning (ML) technique known as reinforcement learning (RL). Its advantage is interacting with the surrounding environment and making decisions based on its own experience. Deep RL (DRL) uses deep neural networks (DNN) to deal with more complex environments (such as continuous state spaces or actions spaces) and to get optimum rewards. As a result, we present a new approach of CW control mechanism, which is termed as contention window threshold (CWThreshold). It uses the DRL principle to define the threshold value and learn optimal settings under various network scenarios. We demonstrate our proposed method, known as a smart exponential-threshold-linear backoff algorithm with a deep Q-learning network (SETL-DQN). The simulation results show that our proposed SETL-DQN algorithm can effectively improve the throughput and reduce the collision rates.

Labeling Q-Learning for Maze Problems with Partially Observable States

  • Lee, Hae-Yeon;Hiroyuki Kamaya;Kenich Abe
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.489-489
    • /
    • 2000
  • Recently, Reinforcement Learning(RL) methods have been used far teaming problems in Partially Observable Markov Decision Process(POMDP) environments. Conventional RL-methods, however, have limited applicability to POMDP To overcome the partial observability, several algorithms were proposed [5], [7]. The aim of this paper is to extend our previous algorithm for POMDP, called Labeling Q-learning(LQ-learning), which reinforces incomplete information of perception with labeling. Namely, in the LQ-learning, the agent percepts the current states by pair of observation and its label, and the agent can distinguish states, which look as same, more exactly. Labeling is carried out by a hash-like function, which we call Labeling Function(LF). Numerous labeling functions can be considered, but in this paper, we will introduce several labeling functions based on only 2 or 3 immediate past sequential observations. We introduce the basic idea of LQ-learning briefly, apply it to maze problems, simple POMDP environments, and show its availability with empirical results, look better than conventional RL algorithms.

  • PDF

A Comparison Study on Reinforcement Learning Method that Combines Supervised Knowledge (감독 지식을 융합하는 강화 학습 기법들에 대한 비교 연구)

  • Kim, S.W.;Chang, H.S.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.303-308
    • /
    • 2007
  • 최근에 제안된 감독 지식을 융합하는 강화 학습 기법인 potential-based RL 기법의 효용성은 이론적 최적 정책으로의 수렴성 보장으로 증명되었고, policy-reuse RL 기법의 우수성은 감독지식을 융합하지 않는 기존의 강화학습과 실험적인 비교를 통하여 증명되었지만, policy-reuse RL 기법을 potential-based RL 기법과 비교한 연구는 아직까지 제시된 바가 없었다. 본 논문에서는 potential-based RL 기법과 policy-reuse RL 기법의 실험적인 성능 비교를 통하여 기법이 policy-reuse RL 기법이 policy-reuse RL 기법에 비하여 더 빠르게 수렴한다는 것을 보이며, 또한 policy-reuse RL 기법의 성능은 재사용하는 정책의 optimality에 영향을 받는다는 것을 보인다.

  • PDF