• Title/Summary/Keyword: Q 학습

Search Result 288, Processing Time 0.037 seconds

A Study on Machine Learning and Basic Algorithms (기계학습 및 기본 알고리즘 연구)

  • Kim, Dong-Hyun;Lee, Tae-ho;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.35-36
    • /
    • 2018
  • 본 논문에서는 기계학습 및 기계학습 기법 중에서도 Markov Decision Process (MDP)를 기반으로 하는 강화학습에 대해 알아보고자 한다. 강화학습은 기계학습의 일종으로 주어진 환경 안에서 의사결정자(Agent)는 현재의 상태를 인식하고 가능한 행동 집합 중에서 보상을 극대화할 수 있는 행동을 선택하는 방법이다. 일반적인 기계학습과는 달리 강화학습은 학습에 필요한 사전 지식을 요구하지 않기 때문에 불명확한 환경 속에서도 반복 학습이 가능하다. 본 연구에서는 일반적인 강화학습 및 강화학습 중에서 가장 많이 사용되고 있는 Q-learning 에 대해 간략히 설명한다.

  • PDF

A Case Study on the Effect of Online Cooperative Learning applied in Accounting Class (온라인 협력학습 회계수업 적용방안 및 효과에 관한 사례연구)

  • Song, Seungah
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.4
    • /
    • pp.535-546
    • /
    • 2022
  • This study tried to explore factors for improving academic achievement in online non-face-to-face education based on the survey results of a University's online cooperative learning Q&A. Due to the Corona situation, both professors and learners can easily feel psychological isolation due to the implementation of all non-face-to-face online classes. As one of the methods, it was intended to suggest the direction of future education to various teachers and learners by sharing class cases in which the online cooperative learning methodology was applied. Previous studies on non-face-to-face online learning, online cooperative learning, and learning promotion method were reviewed, and the online Q&A method was adopted as a specific learning promotion method to conduct research. In the Q&A process, learners were given an opportunity to check their learning content, share knowledge and communicate, and performance evaluation-related factors such as guaranteeing anonymity of the questioner and answerer, improvement points system, and absolute evaluation were asked. As a result of the survey analysis, it was found that they are the success factors of online cooperative learning. It is a small change that can be applied in practice in the future where online non-face-to-face learning is likely to continue, but by sharing meaningful cases of application of teaching methodologies, both professors and learners being motivated and actively involved in. It is expected that we will be able to suggest methods and directions for improving skills together by changing and supplementing the learning field.

Generation of Ship's Optimal Route based on Q-Learning (Q-러닝 기반의 선박의 최적 경로 생성)

  • Hyeong-Tak Lee;Min-Kyu Kim;Hyun Yang
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.160-161
    • /
    • 2023
  • Currently, the ship's passage planning relies on the navigator officer's knowledge and empirical methods. However, as ship autonomous navigation technology has recently developed, automation technology for passage planning has been studied in various ways. In this study, we intend to generate an optimal route for a ship based on Q-learning, one of the reinforcement learning techniques. Reinforcement learning is applied in a way that trains experiences for various situations and makes optimal decisions based on them.

  • PDF

DQN Reinforcement Learning for Acrobot in OpenAI Gym Environment (OpenAI Gym 환경의 Acrobot에 대한 DQN 강화학습)

  • Myung-Ju Kang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.35-36
    • /
    • 2023
  • 본 논문에서는 OpenAI Gym 환경에서 제공하는 Acrobot-v1에 대해 DQN(Deep Q-Networks) 강화학습으로 학습시키고, 이 때 적용되는 활성화함수의 성능을 비교분석하였다. DQN 강화학습에 적용한 활성화함수는 ReLU, ReakyReLU, ELU, SELU 그리고 softplus 함수이다. 실험 결과 평균적으로 Leaky_ReLU 활성화함수를 적용했을 때의 보상 값이 높았고, 최대 보상 값은 SELU 활성화 함수를 적용할 때로 나타났다.

  • PDF

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

R-Trader: An Automatic Stock Trading System based on Reinforcement learning (R-Trader: 강화 학습에 기반한 자동 주식 거래 시스템)

  • 이재원;김성동;이종우;채진석
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.785-794
    • /
    • 2002
  • Automatic stock trading systems should be able to solve various kinds of optimization problems such as market trend prediction, stock selection, and trading strategies, in a unified framework. But most of the previous trading systems based on supervised learning have a limit in the ultimate performance, because they are not mainly concerned in the integration of those subproblems. This paper proposes a stock trading system, called R-Trader, based on reinforcement teaming, regarding the process of stock price changes as Markov decision process (MDP). Reinforcement learning is suitable for Joint optimization of predictions and trading strategies. R-Trader adopts two popular reinforcement learning algorithms, temporal-difference (TD) and Q, for selecting stocks and optimizing other trading parameters respectively. Technical analysis is also adopted to devise the input features of the system and value functions are approximated by feedforward neural networks. Experimental results on the Korea stock market show that the proposed system outperforms the market average and also a simple trading system trained by supervised learning both in profit and risk management.

Q-learning for Adaptive LQ Suboptimal Control of Discrete-time Switched Linear System (이산 시간 스위칭 선형 시스템의 적응 LQ 준최적 제어를 위한 Q-학습법)

  • Chun, Tae-Yoon;Choi, Yoon-Ho;Park, Jin-Bae
    • Proceedings of the KIEE Conference
    • /
    • 2011.07a
    • /
    • pp.1874-1875
    • /
    • 2011
  • 본 논문에서는 스위칭 선형 시스템의 적응 LQ 준최적 제어를 위한 Q-학습법 알고리즘을 제안한다. 제안된 제어 알고리즘은 안정성이 증명된 기존 Q-학습법에 기반하며 스위칭 시스템 모델의 변수를 모르는 상황에서도 준최적 제어가 가능하다. 이 알고리즘을 기반으로 기존에 스위칭 시스템에서 고려하지 않았던 각 시스템의 불확실성 및 최적 적응 제어 문제를 해결하고 컴퓨터 모의실험을 통해 제안한 알고리즘의 성능과 결과를 검증한다.

  • PDF

A Comparative Study on the Korean Text Extractive Summarization using Pre-trained Language Model (사전 학습 언어 모델을 이용한 한국어 문서 추출 요약 비교 분석)

  • Young-Rae Cho;Kwang-Hyun Baek;Min-Ji Park;Byung Hoon Park;Sooyeon Shin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.518-521
    • /
    • 2023
  • 오늘날 과도한 정보의 양 속에서 디지털 문서 내 중요한 정보를 효율적으로 획득하는 것은 비용 효율의 측면에서 중요한 요구사항이 되었다. 문서 요약은 자연어 처리의 한 분야로서 원본 문서의 핵심적인 정보를 유지하는 동시에 중요 문장을 추출 또는 생성하는 작업이다. 이 중 추출요약은 정보의 손실 및 잘못된 정보 생성의 가능성을 줄이고 요약 가능하다. 그러나 여러 토크나이저와 임베딩 모델 중 적절한 활용을 위한 비교가 미진한 상황이다. 본 논문에서는 한국어 사전학습된 추출 요약 언어 모델들을 선정하고 추가 데이터셋으로 학습하고 성능 평가를 실시하여 그 결과를 비교 분석하였다.

A Case Study of Flipped Learning application of Basics Cooking Practice Subject using YouTube (유튜브를 활용한 기초조리실습과목의 플립드러닝 적용사례 연구)

  • Shin, Seoung-Hoon;Lee, Kyung-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.5
    • /
    • pp.488-498
    • /
    • 2021
  • This study applied Flipped Learning teaching and learning method to Basics Cooking Practice Subject using YouTube. The purpose of this study is to investigate whether the curriculum is properly progressing by grasping the effects of before and after learning and analyzing learners' subjectivity through the learning process. The investigation period was conducted from August 01, 2020 to September 10, 2020. According to the research design of Q Methodology, it was divided into five stages: Q sample selection, P sample selection, Q sorting, coding and recruiting, conclusion and discussion. As a result of the analysis, the first type (N=5): Prior Learning effect, the second type (N=7): Simulation practice effect, and the third type (N=3): self-efficacy effect. As a result, by applying the flipped learning teaching method of the Basics Cooking Practice Subject using YouTube, positive effects such as inducing interest in the class and increasing confidence were found in active learners, but some learners lacked understanding of the system of the class operation method. However, the lack of number of training sessions compared to other subjects is considered to be a solution to be solved later.

Behavior Learning and Evolution of Swarm Robot System using Q-learning and Cascade SVM (Q-learning과 Cascade SVM을 이용한 군집로봇의 행동학습 및 진화)

  • Seo, Sang-Wook;Yang, Hyun-Chang;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.2
    • /
    • pp.279-284
    • /
    • 2009
  • In swarm robot systems, each robot must behaves by itself according to the its states and environments, and if necessary, must cooperates with other robots in order to carry out a given task. Therefore it is essential that each robot has both learning and evolution ability to adapt the dynamic environments. In this paper, reinforcement learning method using many SVM based on structural risk minimization and distributed genetic algorithms is proposed for behavior learning and evolution of collective autonomous mobile robots. By distributed genetic algorithm exchanging the chromosome acquired under different environments by communication each robot can improve its behavior ability. Specially, in order to improve the performance of evolution, selective crossover using the characteristic of reinforcement learning that basis of Cascade SVM is adopted in this paper.