• 제목/요약/키워드: Deep-Q-networks

검색결과 29건 처리시간 0.027초

DQN 기반 비디오 스트리밍 서비스에서 세그먼트 크기가 품질 선택에 미치는 영향 (The Effect of Segment Size on Quality Selection in DQN-based Video Streaming Services)

  • 김이슬;임경식
    • 한국멀티미디어학회논문지
    • /
    • 제21권10호
    • /
    • pp.1182-1194
    • /
    • 2018
  • The Dynamic Adaptive Streaming over HTTP(DASH) is envisioned to evolve to meet an increasing demand on providing seamless video streaming services in the near future. The DASH performance heavily depends on the client's adaptive quality selection algorithm that is not included in the standard. The existing conventional algorithms are basically based on a procedural algorithm that is not easy to capture and reflect all variations of dynamic network and traffic conditions in a variety of network environments. To solve this problem, this paper proposes a novel quality selection mechanism based on the Deep Q-Network(DQN) model, the DQN-based DASH Adaptive Bitrate(ABR) mechanism. The proposed mechanism adopts a new reward calculation method based on five major performance metrics to reflect the current conditions of networks and devices in real time. In addition, the size of the consecutive video segment to be downloaded is also considered as a major learning metric to reflect a variety of video encodings. Experimental results show that the proposed mechanism quickly selects a suitable video quality even in high error rate environments, significantly reducing frequency of quality changes compared to the existing algorithm and simultaneously improving average video quality during video playback.

A3C 기반의 강화학습을 사용한 DASH 시스템 (A DASH System Using the A3C-based Deep Reinforcement Learning)

  • 최민제;임경식
    • 대한임베디드공학회논문지
    • /
    • 제17권5호
    • /
    • pp.297-307
    • /
    • 2022
  • The simple procedural segment selection algorithm commonly used in Dynamic Adaptive Streaming over HTTP (DASH) reveals severe weakness to provide high-quality streaming services in the integrated mobile networks of various wired and wireless links. A major issue could be how to properly cope with dynamically changing underlying network conditions. The key to meet it should be to make the segment selection algorithm much more adaptive to fluctuation of network traffics. This paper presents a system architecture that replaces the existing procedural segment selection algorithm with a deep reinforcement learning algorithm based on the Asynchronous Advantage Actor-Critic (A3C). The distributed A3C-based deep learning server is designed and implemented to allow multiple clients in different network conditions to stream videos simultaneously, collect learning data quickly, and learn asynchronously, resulting in greatly improved learning speed as the number of video clients increases. The performance analysis shows that the proposed algorithm outperforms both the conventional DASH algorithm and the Deep Q-Network algorithm in terms of the user's quality of experience and the speed of deep learning.

Novel Reward Function for Autonomous Drone Navigating in Indoor Environment

  • Khuong G. T. Diep;Viet-Tuan Le;Tae-Seok Kim;Anh H. Vo;Yong-Guk Kim
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 추계학술발표대회
    • /
    • pp.624-627
    • /
    • 2023
  • Unmanned aerial vehicles are gaining in popularity with the development of science and technology, and are being used for a wide range of purposes, including surveillance, rescue, delivery of goods, and data collection. In particular, the ability to avoid obstacles during navigation without human oversight is one of the essential capabilities that a drone must possess. Many works currently have solved this problem by implementing deep reinforcement learning (DRL) model. The essential core of a DRL model is reward function. Therefore, this paper proposes a new reward function with appropriate action space and employs dueling double deep Q-Networks to train a drone to navigate in indoor environment without collision.

오프 폴리시 강화학습에서 몬테 칼로와 시간차 학습의 균형을 사용한 적은 샘플 복잡도 (Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity)

  • 김차영;박서희;이우식
    • 인터넷정보학회논문지
    • /
    • 제21권5호
    • /
    • pp.1-7
    • /
    • 2020
  • 강화학습에서 근사함수로써 사용되는 딥 인공 신경망은 이론적으로도 실제와 같은 근접한 결과를 나타낸다. 다양한 실질적인 성공 사례에서 시간차 학습(TD) 은 몬테-칼로 학습(MC) 보다 더 나은 결과를 보여주고 있다. 하지만, 일부 선행 연구 중에서 리워드가 매우 드문드문 발생하는 환경이거나, 딜레이가 생기는 경우, MC 가 TD 보다 더 나음을 보여주고 있다. 또한, 에이전트가 환경으로부터 받는 정보가 부분적일 때에, MC가 TD보다 우수함을 나타낸다. 이러한 환경들은 대부분 5-스텝 큐-러닝이나 20-스텝 큐-러닝으로 볼 수 있는데, 이러한 환경들은 성능-퇴보를 낮추는데 도움 되는 긴 롤-아웃 없이도 실험이 계속 진행될 수 있는 환경들이다. 즉, 긴롤-아웃에 상관없는 노이지가 있는 네트웍이 대표적인데, 이때에는 TD 보다는 시간적 에러에 견고한 MC 이거나 MC와 거의 동일한 학습이 더 나은 결과를 보여주고 있다. 이러한 해당 선행 연구들은 TD가 MC보다 낫다고 하는 기존의 통념에 위배되는 것이다. 다시 말하면, 해당 연구들은 TD만의 사용이 아니라, MC와 TD의 병합된 사용이 더 나음을 이론적이기 보다 경험적 예시로써 보여주고 있다. 따라서, 본 연구에서는 선행 연구들에서 보여준 결과를 바탕으로 하고, 해당 연구들에서 사용했던 특별한 리워드에 의한 복잡한 함수 없이, MC와 TD의 밸런스를 랜덤하게 맞추는 좀 더 간단한 방법으로 MC와 TD를 병합하고자 한다. 본 연구의 MC와 TD의 랜덤 병합에 의한 DQN과 TD-학습만을 사용한 이미 잘 알려진 DQN과 비교하여, 본 연구에서 제안한 MC와 TD의 랜덤 병합이 우수한 학습 방법임을 OpenAI Gym의 시뮬레이션을 통하여 증명하였다.

심층강화학습 기반 분산형 전력 시스템에서의 수요와 공급 예측을 통한 전력 거래시스템 (Power Trading System through the Prediction of Demand and Supply in Distributed Power System Based on Deep Reinforcement Learning)

  • 이승우;선준호;김수현;김진영
    • 한국인터넷방송통신학회논문지
    • /
    • 제21권6호
    • /
    • pp.163-171
    • /
    • 2021
  • 본 논문은 분산형 전력 시스템에서 심층강화학습 기반의 전력 생산 환경 및 수요와 공급을 예측하며 자원 할당 알고리즘을 적용해 전력거래 시스템 연구의 최적화된 결과를 보여준다. 전력 거래시스템에 있어서 기존의 중앙집중식 전력 시스템에서 분산형 전력 시스템으로의 패러다임 변화에 맞추어 전력거래에 있어서 공동의 이익을 추구하며 장기적인 거래의 효율을 증가시키는 전력 거래시스템의 구축을 목표로 한다. 심층강화학습의 현실적인 에너지 모델과 환경을 만들고 학습을 시키기 위해 날씨와 매달의 패턴을 분석하여 데이터를 생성하며 시뮬레이션을 진행하는 데 있어서 가우시안 잡음을 추가해 에너지 시장 모델을 구축하였다. 모의실험 결과 제안된 전력 거래시스템은 서로 협조적이며 공동의 이익을 추구하며 장기적으로 이익을 증가시킨 것을 확인하였다.

A DQN-based Two-Stage Scheduling Method for Real-Time Large-Scale EVs Charging Service

  • Tianyang Li;Yingnan Han;Xiaolong Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권3호
    • /
    • pp.551-569
    • /
    • 2024
  • With the rapid development of electric vehicles (EVs) industry, EV charging service becomes more and more important. Especially, in the case of suddenly drop of air temperature or open holidays that large-scale EVs seeking for charging devices (CDs) in a short time. In such scenario, inefficient EV charging scheduling algorithm might lead to a bad service quality, for example, long queueing times for EVs and unreasonable idling time for charging devices. To deal with this issue, this paper propose a Deep-Q-Network (DQN) based two-stage scheduling method for the large-scale EVs charging service. Fine-grained states with two delicate neural networks are proposed to optimize the sequencing of EVs and charging station (CS) arrangement. Two efficient algorithms are presented to obtain the optimal EVs charging scheduling scheme for large-scale EVs charging demand. Three case studies show the superiority of our proposal, in terms of a high service quality (minimized average queuing time of EVs and maximized charging performance at both EV and CS sides) and achieve greater scheduling efficiency. The code and data are available at THE CODE AND DATA.

A Reinforcement Learning Framework for Autonomous Cell Activation and Customized Energy-Efficient Resource Allocation in C-RANs

  • Sun, Guolin;Boateng, Gordon Owusu;Huang, Hu;Jiang, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권8호
    • /
    • pp.3821-3841
    • /
    • 2019
  • Cloud radio access networks (C-RANs) have been regarded in recent times as a promising concept in future 5G technologies where all DSP processors are moved into a central base band unit (BBU) pool in the cloud, and distributed remote radio heads (RRHs) compress and forward received radio signals from mobile users to the BBUs through radio links. In such dynamic environment, automatic decision-making approaches, such as artificial intelligence based deep reinforcement learning (DRL), become imperative in designing new solutions. In this paper, we propose a generic framework of autonomous cell activation and customized physical resource allocation schemes for energy consumption and QoS optimization in wireless networks. We formulate the problem as fractional power control with bandwidth adaptation and full power control and bandwidth allocation models and set up a Q-learning model to satisfy the QoS requirements of users and to achieve low energy consumption with the minimum number of active RRHs under varying traffic demand and network densities. Extensive simulations are conducted to show the effectiveness of our proposed solution compared to existing schemes.

기계학습을 이용한 염화물 확산계수 예측모델 개발 (Development of Prediction Model of Chloride Diffusion Coefficient using Machine Learning)

  • 김현수
    • 한국공간구조학회논문집
    • /
    • 제23권3호
    • /
    • pp.87-94
    • /
    • 2023
  • Chloride is one of the most common threats to reinforced concrete (RC) durability. Alkaline environment of concrete makes a passive layer on the surface of reinforcement bars that prevents the bar from corrosion. However, when the chloride concentration amount at the reinforcement bar reaches a certain level, deterioration of the passive protection layer occurs, causing corrosion and ultimately reducing the structure's safety and durability. Therefore, understanding the chloride diffusion and its prediction are important to evaluate the safety and durability of RC structure. In this study, the chloride diffusion coefficient is predicted by machine learning techniques. Various machine learning techniques such as multiple linear regression, decision tree, random forest, support vector machine, artificial neural networks, extreme gradient boosting annd k-nearest neighbor were used and accuracy of there models were compared. In order to evaluate the accuracy, root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE) and coefficient of determination (R2) were used as prediction performance indices. The k-fold cross-validation procedure was used to estimate the performance of machine learning models when making predictions on data not used during training. Grid search was applied to hyperparameter optimization. It has been shown from numerical simulation that ensemble learning methods such as random forest and extreme gradient boosting successfully predicted the chloride diffusion coefficient and artificial neural networks also provided accurate result.

경주 활성단층대 및 주변 국가지하수 관측정에서 지진과 수위변동 상관관계 연구 (Relationship between Earthquake and Fluctuation of Water Level in Active Fault Zone and National Groundwater Monitoring Wells of Gyeongju Area)

  • 장현우;정찬호;이용천;이유진;홍진우;김천환;김영석;강태섭
    • 지질공학
    • /
    • 제30권4호
    • /
    • pp.617-634
    • /
    • 2020
  • 본 연구의 목적은 경주 단구리 활성단층대에 설치된 지하수 관측정과 주변 국가지하수관측정 124개 공에서 지진발생에 따른 지하수위 변동특성을 알아보는 데 있다. 수위자료를 활용하여 시공간적 분포특성과 지진과의 상관성을 해석하였으며, 발생한 지진에 대하여 관측정의 반응에 대한 효율성(잠재성)을 나타내는 Earthquake effectiveness(ε)와 q-factor를 계산하여 유효성을 분석하였다. 관측기간 중 단구리 관측정의 지하수위 변동은 E1(2019년 4월 22일 M 3.8) 지진 발생 이후(post-seismic) 83 cm의 지하수위 하강을, E2(2019년 6월 11일 M 2.5) 지진발생 전(pre-seismic) 76 cm의 지하수위 상승을 보여주었다. 주변 국가지하수관측정의 수위자료를 이용하여 시간에 따른 공간분포 분석결과, 단층대 주변 관측정에서 수위변동이 상대적으로 높은 특성을 보인다. 그리고 충적지하수보다 암반지하수에서 더 큰 수위변동을 보여 암반 지하수가 지진 관측에 유리함을 보여준다. 이러한 수위의 상승과 하강은 지진에 의해 암반에 가해지는 인장응력에 의한 균열의 증가와 압축응력에 의한 공극의 감소와 관련된 것으로 보인다. 단구리 관측정의 Earthquake effectiveness(ε)와 q-factor의 유효범위는 각각 2.70E-10~5.60E-10, 14.4~18.0으로 산정되었다.