• 제목/요약/키워드: Deep Reinforcement Learning

검색결과 200건 처리시간 0.03초

작물 생산량 예측을 위한 심층강화학습 성능 분석 (Performance Analysis of Deep Reinforcement Learning for Crop Yield Prediction )

  • 옴마킨;이성근
    • 한국전자통신학회논문지
    • /
    • 제18권1호
    • /
    • pp.99-106
    • /
    • 2023
  • 최근 딥러닝 기술을 활용하여 작물 생산량 예측 연구가 많이 진행되고 있다. 딥러닝 알고리즘은 입력 데이터 세트와 작물 예측 결과에 대한 선형 맵을 구성하는데 어려움이 있다. 또한, 알고리즘 구현은 획득한 속성의 비율에 긍정적으로 의존한다. 심층강화학습을 작물 생산량 예측 응용에 적용한다면 이러한 한계점을 보완할 수 있다. 본 논문은 작물 생산량 예측을 개선하기 위해 DQN, Double DQN 및 Dueling DQN 의 성능을 분석한다. DQN 알고리즘은 과대 평가 문제가 제기되지만, Double DQN은 과대 평가를 줄이고 더 나은 결과를 얻을 수 있다. 본 논문에서 제안된 모델은 거짓 판정을 줄이고 예측 정확도를 높이는 것으로 나타났다.

자율주행 자동차의 주차를 위한 강화학습 활성화 함수 비교 분석 (A Comparative Analysis of Reinforcement Learning Activation Functions for Parking of Autonomous Vehicles)

  • 이동철
    • 한국인터넷방송통신학회논문지
    • /
    • 제22권6호
    • /
    • pp.75-81
    • /
    • 2022
  • 주차 공간의 부족함을 획기적으로 해결할 수 있는 자율주행 자동차는 심층 강화 학습을 통해 큰 발전을 이루고 있다. 심층 강화 학습에는 활성화 함수가 사용되는데, 그동안 다양한 활성화 함수가 제안되어 왔으나 적용 환경에 따라 그 성능 편차가 심했다. 따라서 환경에 따라 최적의 활성화 함수를 찾는 것이 효과적인 학습을 위해 중요하다. 본 논문은 자율주행 자동차가 주차를 학습하기 위해 심층 강화 학습을 사용할 때 어떤 활성화 함수를 사용하는 것이 가장 효과적인지 비교 평가하기 위해 강화 학습에 주로 사용되는 12개의 함수를 분석하였다. 이를 위해 성능 평가 환경을 구축하고 각 활성화 함수의 평균 보상을 성공률, 에피소드 길이, 자동차 속도와 비교하였다. 그 결과 가장 높은 보상은 GELU를 사용한 경우였고, ELU는 가장 낮았다. 두 활성화 함수의 보상 차이는 35.2%였다.

픽셀 데이터를 이용한 강화 학습 알고리즘 적용에 관한 연구 (A Study on Application of Reinforcement Learning Algorithm Using Pixel Data)

  • 문새마로;최용락
    • 한국IT서비스학회지
    • /
    • 제15권4호
    • /
    • pp.85-95
    • /
    • 2016
  • Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data.

Fault-tolerant control system for once-through steam generator based on reinforcement learning algorithm

  • Li, Cheng;Yu, Ren;Yu, Wenmin;Wang, Tianshu
    • Nuclear Engineering and Technology
    • /
    • 제54권9호
    • /
    • pp.3283-3292
    • /
    • 2022
  • Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.

MANET에서 종단간 통신지연 최소화를 위한 심층 강화학습 기반 분산 라우팅 알고리즘 (Deep Reinforcement Learning-based Distributed Routing Algorithm for Minimizing End-to-end Delay in MANET)

  • Choi, Yeong-Jun;Seo, Ju-Sung;Hong, Jun-Pyo
    • 한국정보통신학회논문지
    • /
    • 제25권9호
    • /
    • pp.1267-1270
    • /
    • 2021
  • In this paper, we propose a distributed routing algorithm for mobile ad hoc networks (MANET) where mobile devices can be utilized as relays for communication between remote source-destination nodes. The objective of the proposed algorithm is to minimize the end-to-end communication delay caused by transmission failure with deep channel fading. In each hop, the node needs to select the next relaying node by considering a tradeoff relationship between the link stability and forward link distance. Based on such feature, we formulate the problem with partially observable Markov decision process (MDP) and apply deep reinforcement learning to derive effective routing strategy for the formulated MDP. Simulation results show that the proposed algorithm outperforms other baseline schemes in terms of the average end-to-end delay.

Methodology for Apartment Space Arrangement Based on Deep Reinforcement Learning

  • Cheng Yun Chi;Se Won Lee
    • Architectural research
    • /
    • 제26권1호
    • /
    • pp.1-12
    • /
    • 2024
  • This study introduces a deep reinforcement learning (DRL)-based methodology for optimizing apartment space arrangements, addressing the limitations of human capability in evaluating all potential spatial configurations. Leveraging computational power, the methodology facilitates the autonomous exploration and evaluation of innovative layout options, considering architectural principles, legal standards, and client re-quirements. Through comprehensive simulation tests across various apartment types, the research demonstrates the DRL approach's effec-tiveness in generating efficient spatial arrangements that align with current design trends and meet predefined performance objectives. The comparative analysis of AI-generated layouts with those designed by professionals validates the methodology's applicability and potential in enhancing architectural design practices by offering novel, optimized spatial configuration solutions.

Application of Reinforcement Learning in Detecting Fraudulent Insurance Claims

  • Choi, Jung-Moon;Kim, Ji-Hyeok;Kim, Sung-Jun
    • International Journal of Computer Science & Network Security
    • /
    • 제21권9호
    • /
    • pp.125-131
    • /
    • 2021
  • Detecting fraudulent insurance claims is difficult due to small and unbalanced data. Some research has been carried out to better cope with various types of fraudulent claims. Nowadays, technology for detecting fraudulent insurance claims has been increasingly utilized in insurance and technology fields, thanks to the use of artificial intelligence (AI) methods in addition to traditional statistical detection and rule-based methods. This study obtained meaningful results for a fraudulent insurance claim detection model based on machine learning (ML) and deep learning (DL) technologies, using fraudulent insurance claim data from previous research. In our search for a method to enhance the detection of fraudulent insurance claims, we investigated the reinforcement learning (RL) method. We examined how we could apply the RL method to the detection of fraudulent insurance claims. There are limited previous cases of applying the RL method. Thus, we first had to define the RL essential elements based on previous research on detecting anomalies. We applied the deep Q-network (DQN) and double deep Q-network (DDQN) in the learning fraudulent insurance claim detection model. By doing so, we confirmed that our model demonstrated better performance than previous machine learning models.

커리큘럼 기반 심층 강화학습을 이용한 좁은 틈을 통과하는 무인기 군집 내비게이션 (Collective Navigation Through a Narrow Gap for a Swarm of UAVs Using Curriculum-Based Deep Reinforcement Learning)

  • 최명열;신우재;김민우;박휘성;유영빈;이민;오현동
    • 로봇학회논문지
    • /
    • 제19권1호
    • /
    • pp.117-129
    • /
    • 2024
  • This paper introduces collective navigation through a narrow gap using a curriculum-based deep reinforcement learning algorithm for a swarm of unmanned aerial vehicles (UAVs). Collective navigation in complex environments is essential for various applications such as search and rescue, environment monitoring and military tasks operations. Conventional methods, which are easily interpretable from an engineering perspective, divide the navigation tasks into mapping, planning, and control; however, they struggle with increased latency and unmodeled environmental factors. Recently, learning-based methods have addressed these problems by employing the end-to-end framework with neural networks. Nonetheless, most existing learning-based approaches face challenges in complex scenarios particularly for navigating through a narrow gap or when a leader or informed UAV is unavailable. Our approach uses the information of a certain number of nearest neighboring UAVs and incorporates a task-specific curriculum to reduce learning time and train a robust model. The effectiveness of the proposed algorithm is verified through an ablation study and quantitative metrics. Simulation results demonstrate that our approach outperforms existing methods.

Application of Deep Recurrent Q Network with Dueling Architecture for Optimal Sepsis Treatment Policy

  • Do, Thanh-Cong;Yang, Hyung Jeong;Ho, Ngoc-Huynh
    • 스마트미디어저널
    • /
    • 제10권2호
    • /
    • pp.48-54
    • /
    • 2021
  • Sepsis is one of the leading causes of mortality globally, and it costs billions of dollars annually. However, treating septic patients is currently highly challenging, and more research is needed into a general treatment method for sepsis. Therefore, in this work, we propose a reinforcement learning method for learning the optimal treatment strategies for septic patients. We model the patient physiological time series data as the input for a deep recurrent Q-network that learns reliable treatment policies. We evaluate our model using an off-policy evaluation method, and the experimental results indicate that it outperforms the physicians' policy, reducing patient mortality up to 3.04%. Thus, our model can be used as a tool to reduce patient mortality by supporting clinicians in making dynamic decisions.

Energy-Efficient DNN Processor on Embedded Systems for Spontaneous Human-Robot Interaction

  • Kim, Changhyeon;Yoo, Hoi-Jun
    • Journal of Semiconductor Engineering
    • /
    • 제2권2호
    • /
    • pp.130-135
    • /
    • 2021
  • Recently, deep neural networks (DNNs) are actively used for action control so that an autonomous system, such as the robot, can perform human-like behaviors and operations. Unlike recognition tasks, the real-time operation is essential in action control, and it is too slow to use remote learning on a server communicating through a network. New learning techniques, such as reinforcement learning (RL), are needed to determine and select the correct robot behavior locally. In this paper, we propose an energy-efficient DNN processor with a LUT-based processing engine and near-zero skipper. A CNN-based facial emotion recognition and an RNN-based emotional dialogue generation model is integrated for natural HRI system and tested with the proposed processor. It supports 1b to 16b variable weight bit precision with and 57.6% and 28.5% lower energy consumption than conventional MAC arithmetic units for 1b and 16b weight precision. Also, the near-zero skipper reduces 36% of MAC operation and consumes 28% lower energy consumption for facial emotion recognition tasks. Implemented in 65nm CMOS process, the proposed processor occupies 1784×1784 um2 areas and dissipates 0.28 mW and 34.4 mW at 1fps and 30fps facial emotion recognition tasks.