• 제목/요약/키워드: deep q-learning

검색결과 85건 처리시간 0.021초

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • 전기전자학회논문지
    • /
    • 제23권4호
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

지도학습과 강화학습을 이용한 준능동 중간층면진시스템의 최적설계 (Optimal Design of Semi-Active Mid-Story Isolation System using Supervised Learning and Reinforcement Learning)

  • 강주원;김현수
    • 한국공간구조학회논문집
    • /
    • 제21권4호
    • /
    • pp.73-80
    • /
    • 2021
  • A mid-story isolation system was proposed for seismic response reduction of high-rise buildings and presented good control performance. Control performance of a mid-story isolation system was enhanced by introducing semi-active control devices into isolation systems. Seismic response reduction capacity of a semi-active mid-story isolation system mainly depends on effect of control algorithm. AI(Artificial Intelligence)-based control algorithm was developed for control of a semi-active mid-story isolation system in this study. For this research, an practical structure of Shiodome Sumitomo building in Japan which has a mid-story isolation system was used as an example structure. An MR (magnetorheological) damper was used to make a semi-active mid-story isolation system in example model. In numerical simulation, seismic response prediction model was generated by one of supervised learning model, i.e. an RNN (Recurrent Neural Network). Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm The numerical simulation results presented that the DQN algorithm can effectively control a semi-active mid-story isolation system resulting in successful reduction of seismic responses.

스마트 제어알고리즘 개발을 위한 강화학습 리워드 설계 (Reward Design of Reinforcement Learning for Development of Smart Control Algorithm)

  • 김현수;윤기용
    • 한국공간구조학회논문집
    • /
    • 제22권2호
    • /
    • pp.39-46
    • /
    • 2022
  • Recently, machine learning is widely used to solve optimization problems in various engineering fields. In this study, machine learning is applied to development of a control algorithm for a smart control device for reduction of seismic responses. For this purpose, Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm. A single degree of freedom (SDOF) structure with a smart tuned mass damper (TMD) was used as an example structure. A smart TMD system was composed of MR (magnetorheological) damper instead of passive damper. Reward design of reinforcement learning mainly affects the control performance of the smart TMD. Various hyper-parameters were investigated to optimize the control performance of DQN-based control algorithm. Usually, decrease of the time step for numerical simulation is desirable to increase the accuracy of simulation results. However, the numerical simulation results presented that decrease of the time step for reward calculation might decrease the control performance of DQN-based control algorithm. Therefore, a proper time step for reward calculation should be selected in a DQN training process.

DQN 기반 비디오 스트리밍 서비스에서 세그먼트 크기가 품질 선택에 미치는 영향 (The Effect of Segment Size on Quality Selection in DQN-based Video Streaming Services)

  • 김이슬;임경식
    • 한국멀티미디어학회논문지
    • /
    • 제21권10호
    • /
    • pp.1182-1194
    • /
    • 2018
  • The Dynamic Adaptive Streaming over HTTP(DASH) is envisioned to evolve to meet an increasing demand on providing seamless video streaming services in the near future. The DASH performance heavily depends on the client's adaptive quality selection algorithm that is not included in the standard. The existing conventional algorithms are basically based on a procedural algorithm that is not easy to capture and reflect all variations of dynamic network and traffic conditions in a variety of network environments. To solve this problem, this paper proposes a novel quality selection mechanism based on the Deep Q-Network(DQN) model, the DQN-based DASH Adaptive Bitrate(ABR) mechanism. The proposed mechanism adopts a new reward calculation method based on five major performance metrics to reflect the current conditions of networks and devices in real time. In addition, the size of the consecutive video segment to be downloaded is also considered as a major learning metric to reflect a variety of video encodings. Experimental results show that the proposed mechanism quickly selects a suitable video quality even in high error rate environments, significantly reducing frequency of quality changes compared to the existing algorithm and simultaneously improving average video quality during video playback.

표정 피드백을 이용한 딥강화학습 기반 협력로봇 개발 (Deep Reinforcement Learning-Based Cooperative Robot Using Facial Feedback)

  • 전해인;강정훈;강보영
    • 로봇학회논문지
    • /
    • 제17권3호
    • /
    • pp.264-272
    • /
    • 2022
  • Human-robot cooperative tasks are increasingly required in our daily life with the development of robotics and artificial intelligence technology. Interactive reinforcement learning strategies suggest that robots learn task by receiving feedback from an experienced human trainer during a training process. However, most of the previous studies on Interactive reinforcement learning have required an extra feedback input device such as a mouse or keyboard in addition to robot itself, and the scenario where a robot can interactively learn a task with human have been also limited to virtual environment. To solve these limitations, this paper studies training strategies of robot that learn table balancing tasks interactively using deep reinforcement learning with human's facial expression feedback. In the proposed system, the robot learns a cooperative table balancing task using Deep Q-Network (DQN), which is a deep reinforcement learning technique, with human facial emotion expression feedback. As a result of the experiment, the proposed system achieved a high optimal policy convergence rate of up to 83.3% in training and successful assumption rate of up to 91.6% in testing, showing improved performance compared to the model without human facial expression feedback.

정리정돈을 위한 Q-learning 기반의 작업계획기 (Tidy-up Task Planner based on Q-learning)

  • 양민규;안국현;송재복
    • 로봇학회논문지
    • /
    • 제16권1호
    • /
    • pp.56-63
    • /
    • 2021
  • As the use of robots in service area increases, research has been conducted to replace human tasks in daily life with robots. Among them, this study focuses on the tidy-up task on a desk using a robot arm. The order in which tidy-up motions are carried out has a great impact on the success rate of the task. Therefore, in this study, a neural network-based method for determining the priority of the tidy-up motions from the input image is proposed. Reinforcement learning, which shows good performance in the sequential decision-making process, is used to train such a task planner. The training process is conducted in a virtual tidy-up environment that is configured the same as the actual tidy-up environment. To transfer the learning results in the virtual environment to the actual environment, the input image is preprocessed into a segmented image. In addition, the use of a neural network that excludes unnecessary tidy-up motions from the priority during the tidy-up operation increases the success rate of the task planner. Experiments were conducted in the real world to verify the proposed task planning method.

다중 교차로에서 협력적 교통신호제어에 대한 연구 (A Study on Cooperative Traffic Signal Control at multi-intersection)

  • 김대호;정옥란
    • 전기전자학회논문지
    • /
    • 제23권4호
    • /
    • pp.1381-1386
    • /
    • 2019
  • 도시의 교통 혼잡 문제가 심각해지면서 지능형 교통신호제어가 활발하게 연구되고 있다. 강화학습은 교통신호제어에 가장 활발하게 사용되고 있는 알고리즘으로 최근에는 심층 강화학습 알고리즘이 관심을 끌고 있다. 또한 심층 강화학습 알고리즘이 다양한 분야에서 높은 성능을 보이면서 심층 강화학습의 확장 버전들이 빠른 속도로 등장했다. 하지만 기존 교통신호제어 연구들은 대부분 단일 교차로 환경에서 진행되었으며, 단일 교차로의 교통 혼잡만 완화하는 방법은 도시 전체의 교통 상황을 고려하지 못한다는 한계가 있다. 본 논문에서는 다중 교차로 환경에서 협력적 교통신호제어를 제안한다. 신호제어 알고리즘에는 심층 강화학습의 확장 버전들이 결합된 알고리즘을 적용했으며 다중 교차로를 효율적으로 제어하기 위해 인접한 교차로의 교통 상황을 고려하였다. 실험에서는 제안하는 알고리즘과 기존 심층 강화학습 알고리즘을 비교하였으며, 더 나아가 협력적 방법이 적용된 모델과 적용되지 않은 모델의 실험 결과를 보여줌으로써 높은 성능을 증명한다.

딥 러닝을 이용한 자동 댓글 생성에 관한 연구 (A Study on Automatic Comment Generation Using Deep Learning)

  • 최재용;성소윤;김경철
    • 한국게임학회 논문지
    • /
    • 제18권5호
    • /
    • pp.83-92
    • /
    • 2018
  • 최근 다수의 분야에서 딥 러닝을 통한 연구 성과들이 사람의 판단력에 근접하는 결과를 보여주고 있다. 그리고 게임 산업에서는 온라인 커뮤니티, SNS의 활성화가 게임 흥행 여부를 결정할 정도로 중요성이 높아지고 있다. 본 연구는 딥 러닝을 이용해 온라인 커뮤니티, SNS에서 활동할 수 있는 시스템을 구성하고, 온라인 공간에서 사람들이 작성한 텍스트를 읽고 그에 대한 반응을 생성하고 스케쥴에 따라 트위터에 올리는 것을 목표로 한다. 순환 신경망(Recurrent Neural Network)을 이용해 텍스트를 생성하고 글 작성 스케쥴을 생성하는 모델들을 구성했고, 생성한 시각에 맞춰 모델들에 뉴스 제목을 입력해 댓글을 출력 받고 트위터에 작성하는 프로그램을 구현했다. 본 연구결과는 온라인 게임 커뮤니티 활성화, Q&A 서비스 등에 적용이 가능할 것으로 예상된다.

Methodology for Apartment Space Arrangement Based on Deep Reinforcement Learning

  • Cheng Yun Chi;Se Won Lee
    • Architectural research
    • /
    • 제26권1호
    • /
    • pp.1-12
    • /
    • 2024
  • This study introduces a deep reinforcement learning (DRL)-based methodology for optimizing apartment space arrangements, addressing the limitations of human capability in evaluating all potential spatial configurations. Leveraging computational power, the methodology facilitates the autonomous exploration and evaluation of innovative layout options, considering architectural principles, legal standards, and client re-quirements. Through comprehensive simulation tests across various apartment types, the research demonstrates the DRL approach's effec-tiveness in generating efficient spatial arrangements that align with current design trends and meet predefined performance objectives. The comparative analysis of AI-generated layouts with those designed by professionals validates the methodology's applicability and potential in enhancing architectural design practices by offering novel, optimized spatial configuration solutions.

심층 강화학습 기반의 대학 전공과목 추천 시스템 (Recommendation System of University Major Subject based on Deep Reinforcement Learning)

  • 임덕선;민연아;임동균
    • 한국인터넷방송통신학회논문지
    • /
    • 제23권4호
    • /
    • pp.9-15
    • /
    • 2023
  • 기존의 단순 통계 기반 추천 시스템은 학생들의 수강 이력 데이터만을 활용하기 때문에 선호하는 수업을 찾는 것에 많은 어려움을 겪고 있다. 이를 해결하기 위해, 본 연구에서는 심층 강화학습 기반의 개인화된 전공과목 추천 시스템을 제안한다. 이 시스템은 학생의 학과, 학년, 수강 이력 등의 정형 데이터를 기반으로 학생들 간의 유사도를 측정하며, 이를 통해 각 전공과목에 대한 정보와 학생들의 강의 평가를 종합적으로 고려하여 가장 적합한 전공과목을 추천한다. 본 논문에서는 이 DRL 기반의 추천 시스템을 통해 대학생들이 전공과목을 선택하는 데에 유용한 정보를 제공하며, 이를 통계 기반 추천 시스템과 비교하였을 때 더 우수한 성능을 보여주는 것을 확인하였다. 시뮬레이션 결과, 심층 강화학습 기반의 추천 시스템은 통계 기반 추천 시스템에 비해 수강 과목 예측률에서 약 20%의 성능 향상을 보였다. 이러한 결과를 바탕으로, 학생들의 강의 평가를 반영하여 개인화된 과목 추천을 제공하는 새로운 시스템을 제안한다. 이 시스템은 학생들이 자신의 선호와 목표에 맞는 전공과목을 찾는 데에 큰 도움이 될 것으로 기대한다.