• Title/Summary/Keyword: Deep-Q-Network

Search Result 63, Processing Time 0.035 seconds

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • v.54 no.9
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

Deep Learning-Based, Real-Time, False-Pick Filter for an Onsite Earthquake Early Warning (EEW) System (온사이트 지진조기경보를 위한 딥러닝 기반 실시간 오탐지 제거)

  • Seo, JeongBeom;Lee, JinKoo;Lee, Woodong;Lee, SeokTae;Lee, HoJun;Jeon, Inchan;Park, NamRyoul
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.25 no.2
    • /
    • pp.71-81
    • /
    • 2021
  • This paper presents a real-time, false-pick filter based on deep learning to reduce false alarms of an onsite Earthquake Early Warning (EEW) system. Most onsite EEW systems use P-wave to predict S-wave. Therefore, it is essential to properly distinguish P-waves from noises or other seismic phases to avoid false alarms. To reduce false-picks causing false alarms, this study made the EEWNet Part 1 'False-Pick Filter' model based on Convolutional Neural Network (CNN). Specifically, it modified the Pick_FP (Lomax et al.) to generate input data such as the amplitude, velocity, and displacement of three components from 2 seconds ahead and 2 seconds after the P-wave arrival following one-second time steps. This model extracts log-mel power spectrum features from this input data, then classifies P-waves and others using these features. The dataset consisted of 3,189,583 samples: 81,394 samples from event data (727 events in the Korean Peninsula, 103 teleseismic events, and 1,734 events in Taiwan) and 3,108,189 samples from continuous data (recorded by seismic stations in South Korea for 27 months from 2018 to 2020). This model was trained with 1,826,357 samples through balancing, then tested on continuous data samples of the year 2019, filtering more than 99% of strong false-picks that could trigger false alarms. This model was developed as a module for USGS Earthworm and is written in C language to operate with minimal computing resources.

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity (오프 폴리시 강화학습에서 몬테 칼로와 시간차 학습의 균형을 사용한 적은 샘플 복잡도)

  • Kim, Chayoung;Park, Seohee;Lee, Woosik
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.1-7
    • /
    • 2020
  • Deep neural networks(DNN), which are used as approximation functions in reinforcement learning (RN), theoretically can be attributed to realistic results. In empirical benchmark works, time difference learning (TD) shows better results than Monte-Carlo learning (MC). However, among some previous works show that MC is better than TD when the reward is very rare or delayed. Also, another recent research shows when the information observed by the agent from the environment is partial on complex control works, it indicates that the MC prediction is superior to the TD-based methods. Most of these environments can be regarded as 5-step Q-learning or 20-step Q-learning, where the experiment continues without long roll-outs for alleviating reduce performance degradation. In other words, for networks with a noise, a representative network that is regardless of the controlled roll-outs, it is better to learn MC, which is robust to noisy rewards than TD, or almost identical to MC. These studies provide a break with that TD is better than MC. These recent research results show that the way combining MC and TD is better than the theoretical one. Therefore, in this study, based on the results shown in previous studies, we attempt to exploit a random balance with a mixture of TD and MC in RL without any complicated formulas by rewards used in those studies do. Compared to the DQN using the MC and TD random mixture and the well-known DQN using only the TD-based learning, we demonstrate that a well-performed TD learning are also granted special favor of the mixture of TD and MC through an experiments in OpenAI Gym.

A DQN-based Two-Stage Scheduling Method for Real-Time Large-Scale EVs Charging Service

  • Tianyang Li;Yingnan Han;Xiaolong Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.551-569
    • /
    • 2024
  • With the rapid development of electric vehicles (EVs) industry, EV charging service becomes more and more important. Especially, in the case of suddenly drop of air temperature or open holidays that large-scale EVs seeking for charging devices (CDs) in a short time. In such scenario, inefficient EV charging scheduling algorithm might lead to a bad service quality, for example, long queueing times for EVs and unreasonable idling time for charging devices. To deal with this issue, this paper propose a Deep-Q-Network (DQN) based two-stage scheduling method for the large-scale EVs charging service. Fine-grained states with two delicate neural networks are proposed to optimize the sequencing of EVs and charging station (CS) arrangement. Two efficient algorithms are presented to obtain the optimal EVs charging scheduling scheme for large-scale EVs charging demand. Three case studies show the superiority of our proposal, in terms of a high service quality (minimized average queuing time of EVs and maximized charging performance at both EV and CS sides) and achieve greater scheduling efficiency. The code and data are available at THE CODE AND DATA.

Performance Comparison of Reinforcement Learning Algorithms for Futures Scalping (해외선물 스캘핑을 위한 강화학습 알고리즘의 성능비교)

  • Jung, Deuk-Kyo;Lee, Se-Hun;Kang, Jae-Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.5
    • /
    • pp.697-703
    • /
    • 2022
  • Due to the recent economic downturn caused by Covid-19 and the unstable international situation, many investors are choosing the derivatives market as a means of investment. However, the derivatives market has a greater risk than the stock market, and research on the market of market participants is insufficient. Recently, with the development of artificial intelligence, machine learning has been widely used in the derivatives market. In this paper, reinforcement learning, one of the machine learning techniques, is applied to analyze the scalping technique that trades futures in minutes. The data set consists of 21 attributes using the closing price, moving average line, and Bollinger band indicators of 1 minute and 3 minute data for 6 months by selecting 4 products among futures products traded at trading firm. In the experiment, DNN artificial neural network model and three reinforcement learning algorithms, namely, DQN (Deep Q-Network), A2C (Advantage Actor Critic), and A3C (Asynchronous A2C) were used, and they were trained and verified through learning data set and test data set. For scalping, the agent chooses one of the actions of buying and selling, and the ratio of the portfolio value according to the action result is rewarded. Experiment results show that the energy sector products such as Heating Oil and Crude Oil yield relatively high cumulative returns compared to the index sector products such as Mini Russell 2000 and Hang Seng Index.

Development of Interior Self-driving Service Robot Using Embedded Board Based on Reinforcement Learning (강화학습 기반 임베디드 보드를 활용한 실내자율 주행 서비스 로봇 개발)

  • Oh, Hyeon-Tack;Baek, Ji-Hoon;Lee, Seung-Jin;Kim, Sang-Hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.537-540
    • /
    • 2018
  • 본 논문은 Jetson_TX2(임베디드 보드)의 ROS(Robot Operating System)기반으로 맵 지도를 작성하고, SLAM 및 DQN(Deep Q-Network)을 이용한 목적지까지의 이동명령(목표 선속도, 목표 각속도)을 자이로센서로 측정한 현재 각속도를 이용하여 Cortex-M3의 기반의 MCU(Micro Controllor Unit)에 하달하여 엔코더(encoder) 모터에서 측정한 현재 선속도와 자이로센서에서 측정한 각속도 값을 이용하여 PID제어를 통한 실내 자율주행 서비스 로봇.

CAB: Classifying Arrhythmias based on Imbalanced Sensor Data

  • Wang, Yilin;Sun, Le;Subramani, Sudha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.7
    • /
    • pp.2304-2320
    • /
    • 2021
  • Intelligently detecting anomalies in health sensor data streams (e.g., Electrocardiogram, ECG) can improve the development of E-health industry. The physiological signals of patients are collected through sensors. Timely diagnosis and treatment save medical resources, promote physical health, and reduce complications. However, it is difficult to automatically classify the ECG data, as the features of ECGs are difficult to extract. And the volume of labeled ECG data is limited, which affects the classification performance. In this paper, we propose a Generative Adversarial Network (GAN)-based deep learning framework (called CAB) for heart arrhythmia classification. CAB focuses on improving the detection accuracy based on a small number of labeled samples. It is trained based on the class-imbalance ECG data. Augmenting ECG data by a GAN model eliminates the impact of data scarcity. After data augmentation, CAB classifies the ECG data by using a Bidirectional Long Short Term Memory Recurrent Neural Network (Bi-LSTM). Experiment results show a better performance of CAB compared with state-of-the-art methods. The overall classification accuracy of CAB is 99.71%. The F1-scores of classifying Normal beats (N), Supraventricular ectopic beats (S), Ventricular ectopic beats (V), Fusion beats (F) and Unclassifiable beats (Q) heartbeats are 99.86%, 97.66%, 99.05%, 98.57% and 99.88%, respectively. Unclassifiable beats (Q) heartbeats are 99.86%, 97.66%, 99.05%, 98.57% and 99.88%, respectively.

A Reinforcement Learning Framework for Autonomous Cell Activation and Customized Energy-Efficient Resource Allocation in C-RANs

  • Sun, Guolin;Boateng, Gordon Owusu;Huang, Hu;Jiang, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.8
    • /
    • pp.3821-3841
    • /
    • 2019
  • Cloud radio access networks (C-RANs) have been regarded in recent times as a promising concept in future 5G technologies where all DSP processors are moved into a central base band unit (BBU) pool in the cloud, and distributed remote radio heads (RRHs) compress and forward received radio signals from mobile users to the BBUs through radio links. In such dynamic environment, automatic decision-making approaches, such as artificial intelligence based deep reinforcement learning (DRL), become imperative in designing new solutions. In this paper, we propose a generic framework of autonomous cell activation and customized physical resource allocation schemes for energy consumption and QoS optimization in wireless networks. We formulate the problem as fractional power control with bandwidth adaptation and full power control and bandwidth allocation models and set up a Q-learning model to satisfy the QoS requirements of users and to achieve low energy consumption with the minimum number of active RRHs under varying traffic demand and network densities. Extensive simulations are conducted to show the effectiveness of our proposed solution compared to existing schemes.

Trading Strategies Using Reinforcement Learning (강화학습을 이용한 트레이딩 전략)

  • Cho, Hyunmin;Shin, Hyun Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.1
    • /
    • pp.123-130
    • /
    • 2021
  • With the recent developments in computer technology, there has been an increasing interest in the field of machine learning. This also has led to a significant increase in real business cases of machine learning theory in various sectors. In finance, it has been a major challenge to predict the future value of financial products. Since the 1980s, the finance industry has relied on technical and fundamental analysis for this prediction. For future value prediction models using machine learning, model design is of paramount importance to respond to market variables. Therefore, this paper quantitatively predicts the stock price movements of individual stocks listed on the KOSPI market using machine learning techniques; specifically, the reinforcement learning model. The DQN and A2C algorithms proposed by Google Deep Mind in 2013 are used for the reinforcement learning and they are applied to the stock trading strategies. In addition, through experiments, an input value to increase the cumulative profit is selected and its superiority is verified by comparison with comparative algorithms.

Prediction Technique of Energy Consumption based on Reinforcement Learning in Microgrids (마이크로그리드에서 강화학습 기반 에너지 사용량 예측 기법)

  • Sun, Young-Ghyu;Lee, Jiyoung;Kim, Soo-Hyun;Kim, Soohwan;Lee, Heung-Jae;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.3
    • /
    • pp.175-181
    • /
    • 2021
  • This paper analyzes the artificial intelligence-based approach for short-term energy consumption prediction. In this paper, we employ the reinforcement learning algorithms to improve the limitation of the supervised learning algorithms which usually utilize to the short-term energy consumption prediction technologies. The supervised learning algorithm-based approaches have high complexity because the approaches require contextual information as well as energy consumption data for sufficient performance. We propose a deep reinforcement learning algorithm based on multi-agent to predict energy consumption only with energy consumption data for improving the complexity of data and learning models. The proposed scheme is simulated using public energy consumption data and confirmed the performance. The proposed scheme can predict a similar value to the actual value except for the outlier data.