• Title/Summary/Keyword: DQN

Search Result 68, Processing Time 0.028 seconds

Multicast Tree Generation using Meta Reinforcement Learning in SDN-based Smart Network Platforms

  • Chae, Jihun;Kim, Namgi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3138-3150
    • /
    • 2021
  • Multimedia services on the Internet are continuously increasing. Accordingly, the demand for a technology for efficiently delivering multimedia traffic is also constantly increasing. The multicast technique, that delivers the same content to several destinations, is constantly being developed. This technique delivers a content from a source to all destinations through the multicast tree. The multicast tree with low cost increases the utilization of network resources. However, the finding of the optimal multicast tree that has the minimum link costs is very difficult and its calculation complexity is the same as the complexity of the Steiner tree calculation which is NP-complete. Therefore, we need an effective way to obtain a multicast tree with low cost and less calculation time on SDN-based smart network platforms. In this paper, we propose a new multicast tree generation algorithm which produces a multicast tree using an agent trained by model-based meta reinforcement learning. Experiments verified that the proposed algorithm generated multicast trees in less time compared with existing approximation algorithms. It produced multicast trees with low cost in a dynamic network environment compared with the previous DQN-based algorithm.

A Comparison of Deep Reinforcement Learning and Deep learning for Complex Image Analysis

  • Khajuria, Rishi;Quyoom, Abdul;Sarwar, Abid
    • Journal of Multimedia Information System
    • /
    • v.7 no.1
    • /
    • pp.1-10
    • /
    • 2020
  • The image analysis is an important and predominant task for classifying the different parts of the image. The analysis of complex image analysis like histopathological define a crucial factor in oncology due to its ability to help pathologists for interpretation of images and therefore various feature extraction techniques have been evolved from time to time for such analysis. Although deep reinforcement learning is a new and emerging technique but very less effort has been made to compare the deep learning and deep reinforcement learning for image analysis. The paper highlights how both techniques differ in feature extraction from complex images and discusses the potential pros and cons. The use of Convolution Neural Network (CNN) in image segmentation, detection and diagnosis of tumour, feature extraction is important but there are several challenges that need to be overcome before Deep Learning can be applied to digital pathology. The one being is the availability of sufficient training examples for medical image datasets, feature extraction from whole area of the image, ground truth localized annotations, adversarial effects of input representations and extremely large size of the digital pathological slides (in gigabytes).Even though formulating Histopathological Image Analysis (HIA) as Multi Instance Learning (MIL) problem is a remarkable step where histopathological image is divided into high resolution patches to make predictions for the patch and then combining them for overall slide predictions but it suffers from loss of contextual and spatial information. In such cases the deep reinforcement learning techniques can be used to learn feature from the limited data without losing contextual and spatial information.

Power Trading System through the Prediction of Demand and Supply in Distributed Power System Based on Deep Reinforcement Learning (심층강화학습 기반 분산형 전력 시스템에서의 수요와 공급 예측을 통한 전력 거래시스템)

  • Lee, Seongwoo;Seon, Joonho;Kim, Soo-Hyun;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.6
    • /
    • pp.163-171
    • /
    • 2021
  • In this paper, the energy transaction system was optimized by applying a resource allocation algorithm and deep reinforcement learning in the distributed power system. The power demand and supply environment were predicted by deep reinforcement learning. We propose a system that pursues common interests in power trading and increases the efficiency of long-term power transactions in the paradigm shift from conventional centralized to distributed power systems in the power trading system. For a realistic energy simulation model and environment, we construct the energy market by learning weather and monthly patterns adding Gaussian noise. In simulation results, we confirm that the proposed power trading systems are cooperative with each other, seek common interests, and increase profits in the prolonged energy transaction.

Max-Mean N-step Temporal-Difference Learning Using Multi-Step Return (멀티-스텝 누적 보상을 활용한 Max-Mean N-Step 시간차 학습)

  • Hwang, Gyu-Young;Kim, Ju-Bong;Heo, Joo-Seong;Han, Youn-Hee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.5
    • /
    • pp.155-162
    • /
    • 2021
  • n-step TD learning is a combination of Monte Carlo method and one-step TD learning. If appropriate n is selected, n-step TD learning is known as an algorithm that performs better than Monte Carlo method and 1-step TD learning, but it is difficult to select the best values of n. In order to solve the difficulty of selecting the values of n in n-step TD learning, in this paper, using the characteristic that overestimation of Q can improve the performance of initial learning and that all n-step returns have similar values for Q ≈ Q*, we propose a new learning target, which is composed of the maximum and the mean of all k-step returns for 1 ≤ k ≤ n. Finally, in OpenAI Gym's Atari game environment, we compare the proposed algorithm with n-step TD learning and proved that the proposed algorithm is superior to n-step TD learning algorithm.

PGA: An Efficient Adaptive Traffic Signal Timing Optimization Scheme Using Actor-Critic Reinforcement Learning Algorithm

  • Shen, Si;Shen, Guojiang;Shen, Yang;Liu, Duanyang;Yang, Xi;Kong, Xiangjie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.11
    • /
    • pp.4268-4289
    • /
    • 2020
  • Advanced traffic signal timing method plays very important role in reducing road congestion and air pollution. Reinforcement learning is considered as superior approach to build traffic light timing scheme by many recent studies. It fulfills real adaptive control by the means of taking real-time traffic information as state, and adjusting traffic light scheme as action. However, existing works behave inefficient in complex intersections and they are lack of feasibility because most of them adopt traffic light scheme whose phase sequence is flexible. To address these issues, a novel adaptive traffic signal timing scheme is proposed. It's based on actor-critic reinforcement learning algorithm, and advanced techniques proximal policy optimization and generalized advantage estimation are integrated. In particular, a new kind of reward function and a simplified form of state representation are carefully defined, and they facilitate to improve the learning efficiency and reduce the computational complexity, respectively. Meanwhile, a fixed phase sequence signal scheme is derived, and constraint on the variations of successive phase durations is introduced, which enhances its feasibility and robustness in field applications. The proposed scheme is verified through field-data-based experiments in both medium and high traffic density scenarios. Simulation results exhibit remarkable improvement in traffic performance as well as the learning efficiency comparing with the existing reinforcement learning-based methods such as 3DQN and DDQN.

A Development of Nurse Scheduling Model Based on Q-Learning Algorithm

  • JUNG, In-Chul;KIM, Yeun-Su;IM, Sae-Ran;IHM, Chun-Hwa
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • In this paper, We focused the issue of creating a socially problematic nurse schedule. The nurse schedule should be prepared in consideration of three shifts, appropriate placement of experienced workers, the fairness of work assignment, and legal work standards. Because of the complex structure of the nurse schedule, which must reflect various requirements, in most hospitals, the nurse in charge writes it by hand with a lot of time and effort. This study attempted to automatically create an optimized nurse schedule based on legal labor standards and fairness. We developed an I/O Q-Learning algorithm-based model based on Python and Web Application for automatic nurse schedule. The model was trained to converge to 100 by creating an Fairness Indicator Score(FIS) that considers Labor Standards Act, Work equity, Work preference. Manual nurse schedules and this model are compared with FIS. This model showed a higher work equity index of 13.31 points, work preference index of 1.52 points, and FIS of 16.38 points. This study was able to automatically generate nurse schedule based on reinforcement Learning. In addition, as a result of creating the nurse schedule of E hospital using this model, it was possible to reduce the time required from 88 hours to 3 hours. If additional supplementation of FIS and reinforcement Learning techniques such as DQN, CNN, Monte Carlo Simulation and AlphaZero additionally utilize a more an optimized model can be developed.

A Study on the Development of Adversarial Simulator for Network Vulnerability Analysis Based on Reinforcement Learning (강화학습 기반 네트워크 취약점 분석을 위한 적대적 시뮬레이터 개발 연구)

  • Jeongyoon Kim; Jongyoul Park;Sang Ho Oh
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.1
    • /
    • pp.21-29
    • /
    • 2024
  • With the development of ICT and network, security management of IT infrastructure that has grown in size is becoming very difficult. Many companies and public institutions are having difficulty managing system and network security. In addition, as the complexity of hardware and software grows, it is becoming almost impossible for a person to manage all security. Therefore, AI is essential for network security management. However, since it is very dangerous to operate an attack model in a real network environment, cybersecurity emulation research was conducted through reinforcement learning by implementing a real-life network environment. To this end, this study applied reinforcement learning to the network environment, and as the learning progressed, the agent accurately identified the vulnerability of the network. When a network vulnerability is detected through AI, automated customized response becomes possible.

Decision Support System of Obstacle Avoidance for Mobile Vehicles (다양한 자율주행 이동체에 적용하기 위한 장애물 회피의사 결정 시스템 연구)

  • Kang, Byung-Jun;Kim, Jongwon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.6
    • /
    • pp.639-645
    • /
    • 2018
  • This paper is intended to develop a decision model that can be applied to autonomous vehicles and autonomous mobile vehicles. The developed module has an independent configuration for application in various driving environments and is based on a platform for organically operating them. Each module is studied for decision making on lane changes and for securing safety through reinforcement learning using a deep learning technique. The autonomous mobile moving body operating to change the driving state has a characteristic where the next operation of the mobile body can be determined only if the definition of the speed determination model (according to its functions) and the lane change decision are correctly preceded. Also, if all the moving bodies traveling on a general road are equipped with an autonomous driving function, it is difficult to consider the factors that may occur between each mobile unit from unexpected environmental changes. Considering these factors, we applied the decision model to the platform and studied the lane change decision system for implementation of the platform. We studied the decision model using a modular learning method to reduce system complexity, to reduce the learning time, and to consider model replacement.