• Title/Summary/Keyword: deep Q-learning network (DQN)

Search Result 30, Processing Time 0.028 seconds

Deep Q-Learning Network Model for Container Ship Master Stowage Plan (컨테이너 선박 마스터 적하계획을 위한 심층강화학습 모형)

  • Shin, Jae-Young;Ryu, Hyun-Seung
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.24 no.1
    • /
    • pp.19-29
    • /
    • 2021
  • In the Port Logistics system, Container Stowage planning is an important issue for cost-effective efficiency improvements. At present, Planners are mainly carrying out Stowage planning by manual or semi-automatically. However, as the trend of super-large container ships continues, it is difficult to calculate an efficient Stowage plan with manpower. With the recent rapid development of artificial intelligence-related technologies, many studies have been conducted to apply enhanced learning to optimization problems. Accordingly, in this paper, we intend to develop and present a Deep Q-Learning Network model for the Master Stowage planning of Container ships.

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity (오프 폴리시 강화학습에서 몬테 칼로와 시간차 학습의 균형을 사용한 적은 샘플 복잡도)

  • Kim, Chayoung;Park, Seohee;Lee, Woosik
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.1-7
    • /
    • 2020
  • Deep neural networks(DNN), which are used as approximation functions in reinforcement learning (RN), theoretically can be attributed to realistic results. In empirical benchmark works, time difference learning (TD) shows better results than Monte-Carlo learning (MC). However, among some previous works show that MC is better than TD when the reward is very rare or delayed. Also, another recent research shows when the information observed by the agent from the environment is partial on complex control works, it indicates that the MC prediction is superior to the TD-based methods. Most of these environments can be regarded as 5-step Q-learning or 20-step Q-learning, where the experiment continues without long roll-outs for alleviating reduce performance degradation. In other words, for networks with a noise, a representative network that is regardless of the controlled roll-outs, it is better to learn MC, which is robust to noisy rewards than TD, or almost identical to MC. These studies provide a break with that TD is better than MC. These recent research results show that the way combining MC and TD is better than the theoretical one. Therefore, in this study, based on the results shown in previous studies, we attempt to exploit a random balance with a mixture of TD and MC in RL without any complicated formulas by rewards used in those studies do. Compared to the DQN using the MC and TD random mixture and the well-known DQN using only the TD-based learning, we demonstrate that a well-performed TD learning are also granted special favor of the mixture of TD and MC through an experiments in OpenAI Gym.

A Study on the Improvement of Heat Energy Efficiency for Utilities of Heat Consumer Plants based on Reinforcement Learning (강화학습을 기반으로 하는 열사용자 기계실 설비의 열효율 향상에 대한 연구)

  • Kim, Young-Gon;Heo, Keol;You, Ga-Eun;Lim, Hyun-Seo;Choi, Jung-In;Ku, Ki-Dong;Eom, Jae-Sik;Jeon, Young-Shin
    • Journal of Energy Engineering
    • /
    • v.27 no.2
    • /
    • pp.26-31
    • /
    • 2018
  • This paper introduces a study to improve the thermal efficiency of the district heating user control facility based on reinforcement learning. As an example, it is proposed a general method of constructing a deep Q learning network(DQN) using deep Q learning, which is a reinforcement learning algorithm that does not specify a model. In addition, it is also introduced the big data platform system and the integrated heat management system which are specialized in energy field applied in processing huge amount of data processing from IoT sensor installed in many thermal energy control facilities.

Trading Strategies Using Reinforcement Learning (강화학습을 이용한 트레이딩 전략)

  • Cho, Hyunmin;Shin, Hyun Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.1
    • /
    • pp.123-130
    • /
    • 2021
  • With the recent developments in computer technology, there has been an increasing interest in the field of machine learning. This also has led to a significant increase in real business cases of machine learning theory in various sectors. In finance, it has been a major challenge to predict the future value of financial products. Since the 1980s, the finance industry has relied on technical and fundamental analysis for this prediction. For future value prediction models using machine learning, model design is of paramount importance to respond to market variables. Therefore, this paper quantitatively predicts the stock price movements of individual stocks listed on the KOSPI market using machine learning techniques; specifically, the reinforcement learning model. The DQN and A2C algorithms proposed by Google Deep Mind in 2013 are used for the reinforcement learning and they are applied to the stock trading strategies. In addition, through experiments, an input value to increase the cumulative profit is selected and its superiority is verified by comparison with comparative algorithms.

Deep Reinforcement Learning based Tourism Experience Path Finding

  • Kyung-Hee Park;Juntae Kim
    • Journal of Platform Technology
    • /
    • v.11 no.6
    • /
    • pp.21-27
    • /
    • 2023
  • In this paper, we introduce a reinforcement learning-based algorithm for personalized tourist path recommendations. The algorithm employs a reinforcement learning agent to explore tourist regions and identify optimal paths that are expected to enhance tourism experiences. The concept of tourism experience is defined through points of interest (POI) located along tourist paths within the tourist area. These metrics are quantified through aggregated evaluation scores derived from reviews submitted by past visitors. In the experimental setup, the foundational learning model used to find tour paths is the Deep Q-Network (DQN). Despite the limited availability of historical tourist behavior data, the agent adeptly learns travel paths by incorporating preference scores of tourist POIs and spatial information of the travel area.

  • PDF

Performance Comparison of Reinforcement Learning Algorithms for Futures Scalping (해외선물 스캘핑을 위한 강화학습 알고리즘의 성능비교)

  • Jung, Deuk-Kyo;Lee, Se-Hun;Kang, Jae-Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.5
    • /
    • pp.697-703
    • /
    • 2022
  • Due to the recent economic downturn caused by Covid-19 and the unstable international situation, many investors are choosing the derivatives market as a means of investment. However, the derivatives market has a greater risk than the stock market, and research on the market of market participants is insufficient. Recently, with the development of artificial intelligence, machine learning has been widely used in the derivatives market. In this paper, reinforcement learning, one of the machine learning techniques, is applied to analyze the scalping technique that trades futures in minutes. The data set consists of 21 attributes using the closing price, moving average line, and Bollinger band indicators of 1 minute and 3 minute data for 6 months by selecting 4 products among futures products traded at trading firm. In the experiment, DNN artificial neural network model and three reinforcement learning algorithms, namely, DQN (Deep Q-Network), A2C (Advantage Actor Critic), and A3C (Asynchronous A2C) were used, and they were trained and verified through learning data set and test data set. For scalping, the agent chooses one of the actions of buying and selling, and the ratio of the portfolio value according to the action result is rewarded. Experiment results show that the energy sector products such as Heating Oil and Crude Oil yield relatively high cumulative returns compared to the index sector products such as Mini Russell 2000 and Hang Seng Index.

Deep Reinforcement Learning-Based Edge Caching in Heterogeneous Networks

  • Yoonjeong, Choi; Yujin, Lim
    • Journal of Information Processing Systems
    • /
    • v.18 no.6
    • /
    • pp.803-812
    • /
    • 2022
  • With the increasing number of mobile device users worldwide, utilizing mobile edge computing (MEC) devices close to users for content caching can reduce transmission latency than receiving content from a server or cloud. However, because MEC has limited storage capacity, it is necessary to determine the content types and sizes to be cached. In this study, we investigate a caching strategy that increases the hit ratio from small base stations (SBSs) for mobile users in a heterogeneous network consisting of one macro base station (MBS) and multiple SBSs. If there are several SBSs that users can access, the hit ratio can be improved by reducing duplicate content and increasing the diversity of content in SBSs. We propose a Deep Q-Network (DQN)-based caching strategy that considers time-varying content popularity and content redundancy in multiple SBSs. Content is stored in the SBS in a divided form using maximum distance separable (MDS) codes to enhance the diversity of the content. Experiments in various environments show that the proposed caching strategy outperforms the other methods in terms of hit ratio.

Performance Comparison of Deep Reinforcement Learning based Computation Offloading in MEC (MEC 환경에서 심층 강화학습을 이용한 오프로딩 기법의 성능비교)

  • Moon, Sungwon;Lim, Yujin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.52-55
    • /
    • 2022
  • 5G 시대에 스마트 모바일 기기가 기하급수적으로 증가하면서 멀티 액세스 엣지 컴퓨팅(MEC)이 유망한 기술로 부상했다. 낮은 지연시간 안에 계산 집약적인 서비스를 제공하기 위해 MEC 서버로 오프로딩하는 특히, 태스크 도착률과 무선 채널의 상태가 확률적인 MEC 시스템 환경에서의 오프로딩 연구가 주목받고 있다. 본 논문에서는 차량의 전력과 지연시간을 최소화하기 위해 로컬 실행을 위한 연산 자원과 오프로딩을 위한 전송 전력을 할당하는 심층 강화학습 기반의 오프로딩 기법을 제안하였다. Deep Deterministic Policy Gradient (DDPG) 기반 기법과 Deep Q-network (DQN) 기반 기법을 차량의 전력 소비량과 큐잉 지연시간 측면에서 성능을 비교 분석하였다.

A DQN-based Two-Stage Scheduling Method for Real-Time Large-Scale EVs Charging Service

  • Tianyang Li;Yingnan Han;Xiaolong Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.551-569
    • /
    • 2024
  • With the rapid development of electric vehicles (EVs) industry, EV charging service becomes more and more important. Especially, in the case of suddenly drop of air temperature or open holidays that large-scale EVs seeking for charging devices (CDs) in a short time. In such scenario, inefficient EV charging scheduling algorithm might lead to a bad service quality, for example, long queueing times for EVs and unreasonable idling time for charging devices. To deal with this issue, this paper propose a Deep-Q-Network (DQN) based two-stage scheduling method for the large-scale EVs charging service. Fine-grained states with two delicate neural networks are proposed to optimize the sequencing of EVs and charging station (CS) arrangement. Two efficient algorithms are presented to obtain the optimal EVs charging scheduling scheme for large-scale EVs charging demand. Three case studies show the superiority of our proposal, in terms of a high service quality (minimized average queuing time of EVs and maximized charging performance at both EV and CS sides) and achieve greater scheduling efficiency. The code and data are available at THE CODE AND DATA.

Development of Interior Self-driving Service Robot Using Embedded Board Based on Reinforcement Learning (강화학습 기반 임베디드 보드를 활용한 실내자율 주행 서비스 로봇 개발)

  • Oh, Hyeon-Tack;Baek, Ji-Hoon;Lee, Seung-Jin;Kim, Sang-Hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.537-540
    • /
    • 2018
  • 본 논문은 Jetson_TX2(임베디드 보드)의 ROS(Robot Operating System)기반으로 맵 지도를 작성하고, SLAM 및 DQN(Deep Q-Network)을 이용한 목적지까지의 이동명령(목표 선속도, 목표 각속도)을 자이로센서로 측정한 현재 각속도를 이용하여 Cortex-M3의 기반의 MCU(Micro Controllor Unit)에 하달하여 엔코더(encoder) 모터에서 측정한 현재 선속도와 자이로센서에서 측정한 각속도 값을 이용하여 PID제어를 통한 실내 자율주행 서비스 로봇.