• Title/Summary/Keyword: Q 학습

Search Result 290, Processing Time 0.031 seconds

Q-Learning Policy and Reward Design for Efficient Path Selection (효율적인 경로 선택을 위한 Q-Learning 정책 및 보상 설계)

  • Yong, Sung-Jung;Park, Hyo-Gyeong;You, Yeon-Hwi;Moon, Il-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.2
    • /
    • pp.72-77
    • /
    • 2022
  • Among the techniques of reinforcement learning, Q-Learning means learning optimal policies by learning Q functions that perform actionsin a given state and predict future efficient expectations. Q-Learning is widely used as a basic algorithm for reinforcement learning. In this paper, we studied the effectiveness of selecting and learning efficient paths by designing policies and rewards based on Q-Learning. In addition, the results of the existing algorithm and punishment compensation policy and the proposed punishment reinforcement policy were compared by applying the same number of times of learning to the 8x8 grid environment of the Frozen Lake game. Through this comparison, it was analyzed that the Q-Learning punishment reinforcement policy proposed in this paper can significantly increase the learning speed compared to the application of conventional algorithms.

(The Development of Janggi Board Game Using Backpropagation Neural Network and Q Learning Algorithm) (역전파 신경회로망과 Q학습을 이용한 장기보드게임 개발)

  • 황상문;박인규;백덕수;진달복
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.1
    • /
    • pp.83-90
    • /
    • 2002
  • This paper proposed the strategy learning method by means of the fusion of Back-Propagation neural network and Q learning algorithm for two-person, deterministic janggi board game. The learning process is accomplished simply through the playing each other. The system consists of two parts of move generator and search kernel. The one consists of move generator generating the moves on the board, the other consists of back-propagation and Q learning plus $\alpha$$\beta$ search algorithm in an attempt to learn the evaluation function. while temporal difference learns the discrepancy between the adjacent rewards, Q learning acquires the optimal policies even when there is no prior knowledge of effects of its moves on the environment through the learning of the evaluation function for the augmented rewards. Depended on the evaluation function through lots of games through the learning procedure it proved that the percentage won is linearly proportional to the portion of learning in general.

Extended Q-Learning under Multiple Subtasks (복수의 부분작업을 처리할 수 있는 확정된 Q-Learning)

  • 오도훈;이현숙;오경환
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.1_2
    • /
    • pp.25-34
    • /
    • 2001
  • 지식을 관리하는 것에 주력했던 기존의 인공지능 연구 방향은 동적으로 움직이는 외부 환경에서 적응할 수 있는 시스템 구축으로 변화하고 있다. 이러한 시스템의 기본 능력을 이루는 많은 학습방법 중에서 비교적 최근에 제시된 강화학습은 일반적인 사례에 적용하기 쉽고 동적인 환경에서 뛰어난 적응 능력을 보여주었다. 이런 장점을 바탕으로 강화학습은 에이전트 연구에 많이 사용되고 있다. 하지만, 현재까지 연구결과는 강화학습으로 구축된 에이전트로 해결할 수 있는 작업의 난이도에 한계가 있음을 보이고 있다. 특히, 복수의 부분 작업으로 구성되어 있는 작업을 처리할 경우에 기본의 강화학습 방법은 문제 해결에 한계를 보여주고 있다. 본 논문에서는 복수의 부분 작업으로 구성된 작업이 왜 처리하기 힘든가를 분석하고, 이런 문제를 처리할 수 있는 방안을 제안한다. 본 논문에서 제안하고 있는 EQ-Learning의 강화학습 방법의 대표적인 Q-Learning을 확장시켜 문제를 해결한다. 이 방법은 각각의 부분 작업 해결 방안을 학습시키고 그 학습 결과들의 적절한 순서를 찾아내 전체 작업을 해결한다. EQ-Learning의 타당성을 검증하기 위해 격자 공간에서 복수의 부분작업으로 구성된 미로 문제를 통하여 실험하였다.

  • PDF

Max-Mean N-step Temporal-Difference Learning Using Multi-Step Return (멀티-스텝 누적 보상을 활용한 Max-Mean N-Step 시간차 학습)

  • Hwang, Gyu-Young;Kim, Ju-Bong;Heo, Joo-Seong;Han, Youn-Hee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.5
    • /
    • pp.155-162
    • /
    • 2021
  • n-step TD learning is a combination of Monte Carlo method and one-step TD learning. If appropriate n is selected, n-step TD learning is known as an algorithm that performs better than Monte Carlo method and 1-step TD learning, but it is difficult to select the best values of n. In order to solve the difficulty of selecting the values of n in n-step TD learning, in this paper, using the characteristic that overestimation of Q can improve the performance of initial learning and that all n-step returns have similar values for Q ≈ Q*, we propose a new learning target, which is composed of the maximum and the mean of all k-step returns for 1 ≤ k ≤ n. Finally, in OpenAI Gym's Atari game environment, we compare the proposed algorithm with n-step TD learning and proved that the proposed algorithm is superior to n-step TD learning algorithm.

Design and implementation of Robot Soccer Agent Based on Reinforcement Learning (강화 학습에 기초한 로봇 축구 에이전트의 설계 및 구현)

  • Kim, In-Cheol
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.139-146
    • /
    • 2002
  • The robot soccer simulation game is a dynamic multi-agent environment. In this paper we suggest a new reinforcement learning approach to each agent's dynamic positioning in such dynamic environment. Reinforcement learning is the machine learning in which an agent learns from indirect, delayed reward an optimal policy to choose sequences of actions that produce the greatest cumulative reward. Therefore the reinforcement learning is different from supervised learning in the sense that there is no presentation of input-output pairs as training examples. Furthermore, model-free reinforcement learning algorithms like Q-learning do not require defining or learning any models of the surrounding environment. Nevertheless these algorithms can learn the optimal policy if the agent can visit every state-action pair infinitely. However, the biggest problem of monolithic reinforcement learning is that its straightforward applications do not successfully scale up to more complex environments due to the intractable large space of states. In order to address this problem, we suggest Adaptive Mediation-based Modular Q-Learning (AMMQL) as an improvement of the existing Modular Q-Learning (MQL). While simple modular Q-learning combines the results from each learning module in a fixed way, AMMQL combines them in a more flexible way by assigning different weight to each module according to its contribution to rewards. Therefore in addition to resolving the problem of large state space effectively, AMMQL can show higher adaptability to environmental changes than pure MQL. In this paper we use the AMMQL algorithn as a learning method for dynamic positioning of the robot soccer agent, and implement a robot soccer agent system called Cogitoniks.

Solving the Gale-Shapley Problem by Ant-Q learning (Ant-Q 학습을 이용한 Gale-Shapley 문제 해결에 관한 연구)

  • Kim, Hyun;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.18B no.3
    • /
    • pp.165-172
    • /
    • 2011
  • In this paper, we propose Ant-Q learning Algorithm[1], which uses the habits of biological ants, to find a new way to solve Stable Marriage Problem(SMP)[3] presented by Gale-Shapley[2]. The issue of SMP is to find optimum matching for a stable marriage based on their preference lists (PL). The problem of Gale-Shapley algorithm is to get a stable matching for only male (or female). We propose other way to satisfy various requirements for SMP. ACS(Ant colony system) is an swarm intelligence method to find optimal solution by using phermone of ants. We try to improve ACS technique by adding Q learning[9] concept. This Ant-Q method can solve SMP problem for various requirements. The experiment results shows the proposed method is good for the problem.

Q-Learning Policy Design to Speed Up Agent Training (에이전트 학습 속도 향상을 위한 Q-Learning 정책 설계)

  • Yong, Sung-jung;Park, Hyo-gyeong;You, Yeon-hwi;Moon, Il-young
    • Journal of Practical Engineering Education
    • /
    • v.14 no.1
    • /
    • pp.219-224
    • /
    • 2022
  • Q-Learning is a technique widely used as a basic algorithm for reinforcement learning. Q-Learning trains the agent in the direction of maximizing the reward through the greedy action that selects the largest value among the rewards of the actions that can be taken in the current state. In this paper, we studied a policy that can speed up agent training using Q-Learning in Frozen Lake 8×8 grid environment. In addition, the training results of the existing algorithm of Q-learning and the algorithm that gave the attribute 'direction' to agent movement were compared. As a result, it was analyzed that the Q-Learning policy proposed in this paper can significantly increase both the accuracy and training speed compared to the general algorithm.

Simple Q-learning using heuristic strategies (휴리스틱 전략을 이용한 Q러닝의 학습 간단화)

  • Park, Jong-cheol;Kim, Hyeon-cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.708-710
    • /
    • 2018
  • 강화학습은 게임의 인공지능을 대체할 수 있는 수단이지만 불완전한 게임에서 학습하기 힘들다. 학습하기 복잡한 불완전안 카드게임에서 휴리스틱한 전략을 만들고 비슷한 상태끼리 묶으면서 학습의 복잡성을 낮추었다. 인공신경망 없이 Q-러닝만으로 게임을 5만판을 통해서 상태에 따른 전략을 학습하였다. 그 결과 동일한 전략만을 사용하는 대결보다 승률이 높게 나왔고, 다양한 상태에서 다른 전략을 선택하는 것을 관찰하였다.

Q-learning to improve learning speed using Minimax algorithm (미니맥스 알고리즘을 이용한 학습속도 개선을 위한 Q러닝)

  • Shin, YongWoo
    • Journal of Korea Game Society
    • /
    • v.18 no.4
    • /
    • pp.99-106
    • /
    • 2018
  • Board games have many game characters and many state spaces. Therefore, games must be long learning. This paper used reinforcement learning algorithm. But, there is weakness with reinforcement learning. At the beginning of learning, reinforcement learning has the drawback of slow learning speed. Therefore, we tried to improve the learning speed by using the heuristic using the knowledge of the problem domain considering the game tree when there is the same best value during learning. In order to compare the existing character the improved one. I produced a board game. So I compete with one-sided attacking character. Improved character attacked the opponent's one considering the game tree. As a result of experiment, improved character's capability was improved on learning speed.

An analysis of Learning Attitude among the Chinese Students in Korea - focused on the Q Methodology - (한국 내 중국 유학생의 학습태도 유형 분석 - Q방법론적 접근 -)

  • Li, Zhangpei;Li, Xiaohui;Park, Changun
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.6
    • /
    • pp.115-123
    • /
    • 2017
  • The purpose of this research provides analyzes the learning attitude types by Chinese students in Korea. For this purpose, we have adopted of the practical research methodology and quantitative research methodology, which can objectively determine the individual's ideas and behavior of the "Q methodology". To this end, This research is targeted at Chinese students in the students' learning attitude implemented by 4 types and analyzes questionnaires of each type. The analysis results are categorized as the type of learning environment is not satisfied; positively cooperate with the learning process and the environment; the lack of learning motivation; and paradoxical learning state. As a result of this discussion, Chinese students should have clear motivation to learn new things; improve their korean language ability; and need to know their clear learning methods. Nowadays, more and more Chinese students are choosing study abroad. Therefore, the learning attitudes and learning abilities as two of the most important of focus from society.