• Title/Summary/Keyword: Deep Reinforcement Learning

Search Result 210, Processing Time 0.025 seconds

A Distributed Scheduling Algorithm based on Deep Reinforcement Learning for Device-to-Device communication networks (단말간 직접 통신 네트워크를 위한 심층 강화학습 기반 분산적 스케쥴링 알고리즘)

  • Jeong, Moo-Woong;Kim, Lyun Woo;Ban, Tae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.11
    • /
    • pp.1500-1506
    • /
    • 2020
  • In this paper, we study a scheduling problem based on reinforcement learning for overlay device-to-device (D2D) communication networks. Even though various technologies for D2D communication networks using Q-learning, which is one of reinforcement learning models, have been studied, Q-learning causes a tremendous complexity as the number of states and actions increases. In order to solve this problem, D2D communication technologies based on Deep Q Network (DQN) have been studied. In this paper, we thus design a DQN model by considering the characteristics of wireless communication systems, and propose a distributed scheduling scheme based on the DQN model that can reduce feedback and signaling overhead. The proposed model trains all parameters in a centralized manner, and transfers the final trained parameters to all mobiles. All mobiles individually determine their actions by using the transferred parameters. We analyze the performance of the proposed scheme by computer simulation and compare it with optimal scheme, opportunistic selection scheme and full transmission scheme.

Comparison of learning performance of character controller based on deep reinforcement learning according to state representation (상태 표현 방식에 따른 심층 강화 학습 기반 캐릭터 제어기의 학습 성능 비교)

  • Sohn, Chaejun;Kwon, Taesoo;Lee, Yoonsang
    • Journal of the Korea Computer Graphics Society
    • /
    • v.27 no.5
    • /
    • pp.55-61
    • /
    • 2021
  • The character motion control based on physics simulation using reinforcement learning continue to being carried out. In order to solve a problem using reinforcement learning, the network structure, hyperparameter, state, action and reward must be properly set according to the problem. In many studies, various combinations of states, action and rewards have been defined and successfully applied to problems. Since there are various combinations in defining state, action and reward, many studies are conducted to analyze the effect of each element to find the optimal combination that improves learning performance. In this work, we analyzed the effect on reinforcement learning performance according to the state representation, which has not been so far. First we defined three coordinate systems: root attached frame, root aligned frame, and projected aligned frame. and then we analyze the effect of state representation by three coordinate systems on reinforcement learning. Second, we analyzed how it affects learning performance when various combinations of joint positions and angles for state.

Applying CEE (CrossEntropyError) to improve performance of Q-Learning algorithm (Q-learning 알고리즘이 성능 향상을 위한 CEE(CrossEntropyError)적용)

  • Kang, Hyun-Gu;Seo, Dong-Sung;Lee, Byeong-seok;Kang, Min-Soo
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.1-9
    • /
    • 2017
  • Recently, the Q-Learning algorithm, which is one kind of reinforcement learning, is mainly used to implement artificial intelligence system in combination with deep learning. Many research is going on to improve the performance of Q-Learning. Therefore, purpose of theory try to improve the performance of Q-Learning algorithm. This Theory apply Cross Entropy Error to the loss function of Q-Learning algorithm. Since the mean squared error used in Q-Learning is difficult to measure the exact error rate, the Cross Entropy Error, known to be highly accurate, is applied to the loss function. Experimental results show that the success rate of the Mean Squared Error used in the existing reinforcement learning was about 12% and the Cross Entropy Error used in the deep learning was about 36%. The success rate was shown.

Variational Autoencoder-based Assembly Feature Extraction Network for Rapid Learning of Reinforcement Learning (강화학습의 신속한 학습을 위한 변이형 오토인코더 기반의 조립 특징 추출 네트워크)

  • Jun-Wan Yun;Minwoo Na;Jae-Bok Song
    • The Journal of Korea Robotics Society
    • /
    • v.18 no.3
    • /
    • pp.352-357
    • /
    • 2023
  • Since robotic assembly in an unstructured environment is very difficult with existing control methods, studies using artificial intelligence such as reinforcement learning have been conducted. However, since long-time operation of a robot for learning in the real environment adversely affects the robot, so a method to shorten the learning time is needed. To this end, a method based on a pre-trained neural network was proposed in this study. This method showed a learning speed about 3 times than the existing methods, and the stability of reward during learning was also increased. Furthermore, it can generate a more optimal policy than not using a pre-trained neural network. Using the proposed reinforcement learning-based assembly trajectory generator, 100 attempts were made to assemble the power connector within a random error of 4.53 mm in width and 3.13 mm in length, resulting in 100 successes.

Dynamic Action Space Handling Method for Reinforcement Learning Models

  • Woo, Sangchul;Sung, Yunsick
    • Journal of Information Processing Systems
    • /
    • v.16 no.5
    • /
    • pp.1223-1230
    • /
    • 2020
  • Recently, extensive studies have been conducted to apply deep learning to reinforcement learning to solve the state-space problem. If the state-space problem was solved, reinforcement learning would become applicable in various fields. For example, users can utilize dance-tutorial systems to learn how to dance by watching and imitating a virtual instructor. The instructor can perform the optimal dance to the music, to which reinforcement learning is applied. In this study, we propose a method of reinforcement learning in which the action space is dynamically adjusted. Because actions that are not performed or are unlikely to be optimal are not learned, and the state space is not allocated, the learning time can be shortened, and the state space can be reduced. In an experiment, the proposed method shows results similar to those of traditional Q-learning even when the state space of the proposed method is reduced to approximately 0.33% of that of Q-learning. Consequently, the proposed method reduces the cost and time required for learning. Traditional Q-learning requires 6 million state spaces for learning 100,000 times. In contrast, the proposed method requires only 20,000 state spaces. A higher winning rate can be achieved in a shorter period of time by retrieving 20,000 state spaces instead of 6 million.

Comparison of Activation Functions of Reinforcement Learning in OpenAI Gym Environments (OpenAI Gym 환경에서 강화학습의 활성화함수 비교 분석)

  • Myung-Ju Kang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.25-26
    • /
    • 2023
  • 본 논문에서는 OpenAI Gym 환경에서 제공하는 CartPole-v1에 대해 강화학습을 통해 에이전트를 학습시키고, 학습에 적용되는 활성화함수의 성능을 비교분석하였다. 본 논문에서 적용한 활성화함수는 Sigmoid, ReLU, ReakyReLU 그리고 softplus 함수이며, 각 활성화함수를 DQN(Deep Q-Networks) 강화학습에 적용했을 때 보상 값을 비교하였다. 실험결과 ReLU 활성화함수를 적용하였을 때의 보상이 가장 높은 것을 알 수 있었다.

  • PDF

Deep reinforcement learning for optimal life-cycle management of deteriorating regional bridges using double-deep Q-networks

  • Xiaoming, Lei;You, Dong
    • Smart Structures and Systems
    • /
    • v.30 no.6
    • /
    • pp.571-582
    • /
    • 2022
  • Optimal life-cycle management is a challenging issue for deteriorating regional bridges. Due to the complexity of regional bridge structural conditions and a large number of inspection and maintenance actions, decision-makers generally choose traditional passive management strategies. They are less efficiency and cost-effectiveness. This paper suggests a deep reinforcement learning framework employing double-deep Q-networks (DDQNs) to improve the life-cycle management of deteriorating regional bridges to tackle these problems. It could produce optimal maintenance plans considering restrictions to maximize maintenance cost-effectiveness to the greatest extent possible. DDQNs method could handle the problem of the overestimation of Q-values in the Nature DQNs. This study also identifies regional bridge deterioration characteristics and the consequence of scheduled maintenance from years of inspection data. To validate the proposed method, a case study containing hundreds of bridges is used to develop optimal life-cycle management strategies. The optimization solutions recommend fewer replacement actions and prefer preventative repair actions when bridges are damaged or are expected to be damaged. By employing the optimal life-cycle regional maintenance strategies, the conditions of bridges can be controlled to a good level. Compared to the nature DQNs, DDQNs offer an optimized scheme containing fewer low-condition bridges and a more costeffective life-cycle management plan.

Real-Time Path Planning for Mobile Robots Using Q-Learning (Q-learning을 이용한 이동 로봇의 실시간 경로 계획)

  • Kim, Ho-Won;Lee, Won-Chang
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.991-997
    • /
    • 2020
  • Reinforcement learning has been applied mainly in sequential decision-making problems. Especially in recent years, reinforcement learning combined with neural networks has brought successful results in previously unsolved fields. However, reinforcement learning using deep neural networks has the disadvantage that it is too complex for immediate use in the field. In this paper, we implemented path planning algorithm for mobile robots using Q-learning, one of the easy-to-learn reinforcement learning algorithms. We used real-time Q-learning to update the Q-table in real-time since the Q-learning method of generating Q-tables in advance has obvious limitations. By adjusting the exploration strategy, we were able to obtain the learning speed required for real-time Q-learning. Finally, we compared the performance of real-time Q-learning and DQN.

A hidden anti-jamming method based on deep reinforcement learning

  • Wang, Yifan;Liu, Xin;Wang, Mei;Yu, Yu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3444-3457
    • /
    • 2021
  • In the field of anti-jamming based on dynamic spectrum, most methods try to improve the ability to avoid jamming and seldom consider whether the jammer would perceive the user's signal. Although these existing methods work in some anti-jamming scenarios, their long-term performance may be depressed when intelligent jammers can learn user's waveform or decision information from user's historical activities. Hence, we proposed a hidden anti-jamming method to address this problem by reducing the jammer's sense probability. In the proposed method, the action correlation between the user and the jammer is used to evaluate the hiding effect of the user's actions. And a deep reinforcement learning framework, including specific action correlation calculation and iteration learning algorithm, is designed to maximize the hiding and communication performance of the user synchronously. The simulation result shows that the algorithm proposed reduces the jammer's sense probability significantly and improves the user's anti-jamming performance slightly compared to the existing algorithms based on jamming avoidance.

Flexible operation and maintenance optimization of aging cyber-physical energy systems by deep reinforcement learning

  • Zhaojun Hao;Francesco Di Maio;Enrico Zio
    • Nuclear Engineering and Technology
    • /
    • v.56 no.4
    • /
    • pp.1472-1479
    • /
    • 2024
  • Cyber-Physical Energy Systems (CPESs) integrate cyber and hardware components to ensure a reliable and safe physical power production and supply. Renewable Energy Sources (RESs) add uncertainty to energy demand that can be dealt with flexible operation (e.g., load-following) of CPES; at the same time, scenarios that could result in severe consequences due to both component stochastic failures and aging of the cyber system of CPES (commonly overlooked) must be accounted for Operation & Maintenance (O&M) planning. In this paper, we make use of Deep Reinforcement Learning (DRL) to search for the optimal O&M strategy that, not only considers the actual system hardware components health conditions and their Remaining Useful Life (RUL), but also the possible accident scenarios caused by the failures and the aging of the hardware and the cyber components, respectively. The novelty of the work lies in embedding the cyber aging model into the CPES model of production planning and failure process; this model is used to help the RL agent, trained with Proximal Policy Optimization (PPO) and Imitation Learning (IL), finding the proper rejuvenation timing for the cyber system accounting for the uncertainty of the cyber system aging process. An application is provided, with regards to the Advanced Lead-cooled Fast Reactor European Demonstrator (ALFRED).