• 제목/요약/키워드: Action based learning

검색결과 378건 처리시간 0.023초

퍼지 클러스터링을 이용한 강화학습의 함수근사 (Function Approximation for Reinforcement Learning using Fuzzy Clustering)

  • 이영아;정경숙;정태충
    • 정보처리학회논문지B
    • /
    • 제10B권6호
    • /
    • pp.587-592
    • /
    • 2003
  • 강화학습을 적용하기에 적합한 많은 실세계의 제어 문제들은 연속적인 상태 또는 행동(continuous states or actions)을 갖는다. 연속 값을 갖는 문제인 경우, 상태공간의 크기가 거대해져서 모든 상태-행동 쌍을 학습하는데 메모리와 시간상의 문제가 있다. 이를 해결하기 위하여 학습된 유사한 상태로부터 새로운 상태에 대한 추측을 하는 함수 근사 방법이 필요하다. 본 논문에서는 1-step Q-learning의 함수 근사를 위하여 퍼지 클러스터링을 기초로 한 Fuzzy Q-Map을 제안한다. Fuzzy Q-Map은 데이터에 대한 각 클러스터의 소속도(membership degree)를 이용하여 유사한 상태들을 군집하고 행동을 선택하고 Q값을 참조했다. 또한 승자(winner)가 되는 퍼지 클러스터의 중심과 Q값은 소속도와 TD(Temporal Difference) 에러를 이용하여 갱신하였다. 본 논문에서 제안한 방법은 마운틴 카 문제에 적용한 결과, 빠른 수렴 결과를 보였다.

퍼지 추론에 의한 리커런트 뉴럴 네트워크 강화학습 (Fuzzy Inferdence-based Reinforcement Learning for Recurrent Neural Network)

  • 전효병;이동욱;김대준;심귀보
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1997년도 춘계학술대회 학술발표 논문집
    • /
    • pp.120-123
    • /
    • 1997
  • In this paper, we propose the Fuzzy Inference-based Reinforcement Learning Algorithm. We offer more similar learning scheme to the psychological learning of the higher animal's including human, by using Fuzzy Inference in Reinforcement Learning. The proposed method follows the way linguistic and conceptional expression have an effect on human's behavior by reasoning reinforcement based on fuzzy rule. The intervals of fuzzy membership functions are found optimally by genetic algorithms. And using Recurrent state is considered to make an action in dynamical environment. We show the validity of the proposed learning algorithm by applying to the inverted pendulum control problem.

  • PDF

반영적 추상화와 조작적 수학 학습-지도 (Reflective Abstraction and Operational Instruction of Mathematics)

  • 우정호;홍진곤
    • 대한수학교육학회지:수학교육학연구
    • /
    • 제9권2호
    • /
    • pp.383-404
    • /
    • 1999
  • This study began with an epistemological question about the nature of mathematical cognition in relation to the learner's activity. Therefore, by examining Piaget's 'reflective abstraction' theory which can be an answer to the question, we tried to get suggestions which can be given to the mathematical education in practice. 'Reflective abstraction' is formed through the coordination of the epistmmic subject's action while 'empirical abstraction' is formed by the characters of observable concrete object. The reason Piaget distinguished these two kinds of abstraction is that the foundation for the peculiar objectivity and inevitability can be taken from the coordination of the action which is shared by all the epistemic subjects. Moreover, because the mechanism of reflective abstraction, unlike empirical abstraction, does not construct a new operation by simply changing the result of the previous construction, but is forming re-construction which includes the structure previously constructed as a special case, the system which is developed by this mechanism is able to have reasonability constantly. The mechanism of the re-construction of the intellectual system through the reflective abstraction can be explained as continuous spiral alternance between the two complementary processes, 'reflechissement' and 'reflexion'; reflechissement is that the action moves to the higher level through the process of 'int riorisation' and 'thematisation'; reflexion is a process of 'equilibration'between the assimilation and the accomodation of the unbalance caused by the movement of the level. The operational learning principle of the theorists like Aebli who intended to embody Piaget's operational constructivism, attempts to explain the construction of the operation through 'internalization' of the action, but does not sufficiently emphasize the integration of the structure through the 'coordination' of the action and the ensuing discontinuous evolvement of learning level. Thus, based on the examination on the essential characteristic of the reflective abstraction and the mechanism, this study presents the principles of teaching and learning as following; $\circled1$ the principle of the operational interpretation of knowledge, $\circled2$ the principle of the structural interpretation of the operation, $\circled3$ the principle of int riorisation, $\circled4$ the principle of th matisation, $\circled5$ the principle of coordination, reflexion, and integration, $\circled6$ the principle of the discontinuous evolvement of learning level.

  • PDF

모션 그래디언트 히스토그램 기반의 시공간 크기 변화에 강인한 동작 인식 (Spatial-Temporal Scale-Invariant Human Action Recognition using Motion Gradient Histogram)

  • 김광수;김태형;곽수영;변혜란
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제34권12호
    • /
    • pp.1075-1082
    • /
    • 2007
  • 본 논문은 동영상에 등장하는 다수 사람의 동작을 검출하여 검출된 동작을 개별적으로 인식하는 방법을 제안한다. 동작이 수행되는 속도 또는 크기 변화에 강인한 인식 성능을 갖기 위해 시공간축 피라미드(Spatial-Temporal Pyramid)방식을 적용한다. 동작 표현 방식을 통계적 특성 기반의 모션 그래디언트 히스토그램(MGH:Motion Gradient Histogram)으로 선택하여 인식 과정에서 발생하는 복잡도를 최소화 하였다. 다수의 동작을 검출하기 위하여 이진 차영상을 축적한 모션 에너지 이미지(MEI: Motion Energy Image) 방법을 적용하여 효율적으로 개별적 동작 영역을 획득한다. 각 영역은 동작 표현 방법인 MGH로 나타내어지고, 크기 변화에 강인하도록 피라미드 방식을 적응하여 학습된 템플릿 MGH와 유사도를 상호 비교하여 최종 인식 결과를 얻는다. 인식 성능의 평가를 위해 10개의 동영상을 활용하여 단일 객체, 다수 객체, 속도 및 크기 변화, 기존 방식과의 비교, 기타 추가 실험 등을 실시하여 다양한 조건의 영상에서 양호한 인식 결과를 확인 할 수 있었다.

A Method for Learning Macro-Actions for Virtual Characters Using Programming by Demonstration and Reinforcement Learning

  • Sung, Yun-Sick;Cho, Kyun-Geun
    • Journal of Information Processing Systems
    • /
    • 제8권3호
    • /
    • pp.409-420
    • /
    • 2012
  • The decision-making by agents in games is commonly based on reinforcement learning. To improve the quality of agents, it is necessary to solve the problems of the time and state space that are required for learning. Such problems can be solved by Macro-Actions, which are defined and executed by a sequence of primitive actions. In this line of research, the learning time is reduced by cutting down the number of policy decisions by agents. Macro-Actions were originally defined as combinations of the same primitive actions. Based on studies that showed the generation of Macro-Actions by learning, Macro-Actions are now thought to consist of diverse kinds of primitive actions. However an enormous amount of learning time and state space are required to generate Macro-Actions. To resolve these issues, we can apply insights from studies on the learning of tasks through Programming by Demonstration (PbD) to generate Macro-Actions that reduce the learning time and state space. In this paper, we propose a method to define and execute Macro-Actions. Macro-Actions are learned from a human subject via PbD and a policy is learned by reinforcement learning. In an experiment, the proposed method was applied to a car simulation to verify the scalability of the proposed method. Data was collected from the driving control of a human subject, and then the Macro-Actions that are required for running a car were generated. Furthermore, the policy that is necessary for driving on a track was learned. The acquisition of Macro-Actions by PbD reduced the driving time by about 16% compared to the case in which Macro-Actions were directly defined by a human subject. In addition, the learning time was also reduced by a faster convergence of the optimum policies.

온라인 행동 탐지 기술 동향 (Trends in Online Action Detection in Streaming Videos)

  • 문진영;김형일;이용주
    • 전자통신동향분석
    • /
    • 제36권2호
    • /
    • pp.75-82
    • /
    • 2021
  • Online action detection (OAD) in a streaming video is an attractive research area that has aroused interest lately. Although most studies for action understanding have considered action recognition in well-trimmed videos and offline temporal action detection in untrimmed videos, online action detection methods are required to monitor action occurrences in streaming videos. OAD predicts action probabilities for a current frame or frame sequence using a fixed-sized video segment, including past and current frames. In this article, we discuss deep learning-based OAD models. In addition, we investigated OAD evaluation methodologies, including benchmark datasets and performance measures, and compared the performances of the presented OAD models.

A New Residual Attention Network based on Attention Models for Human Action Recognition in Video

  • Kim, Jee-Hyun;Cho, Young-Im
    • 한국컴퓨터정보학회논문지
    • /
    • 제25권1호
    • /
    • pp.55-61
    • /
    • 2020
  • 딥 러닝 기술의 발전과 컴퓨팅 파워 등의 개선으로 인해 비디오 기반 연구는 최근 많은 관심을 얻고 있다. 비디오 데이터가 이미지 데이터와 비교하여 가장 큰 차이는 비디오 데이터에는 많은 양의 시간적, 공간적 정보가 포함되어 있다는 점이다. 이처럼 비디오에 포함된 많은 양의 데이터로 인해 컴퓨터 비전 연구에 있어서 행동 인식은 중요한 연구 과제 중 하나이지만, 비디오와 같이 움직임이 있는 환경에서 인간의 행동 인식은 매우 복잡하고 도전적인 과제이다. 인간에 대한 여러 연구를 바탕으로 인공지능에서는 인간과 유사한 주의(attention)메커니즘이 효율적인 인식 모델이라는 것을 알게 되었다. 이 효율적인 모델은 이미지 정보와 복잡한 연속 비디오 정보를 처리하는 데 이상적이다. 본 논문에서는 이러한 연구배경을 기반으로, 비디오에서 인간의 행동을 효율적으로 인식하기 위해 먼저 인간의 행동에 주목한 후 비디오 행동 인식에 주의메커니즘을 도입하고자 한다. 논문의 주요내용은 두 가지 주의 메카니즘을 기반으로 컨볼루션 신경망을 이용한 새로운 3D 잔류 주의 네트워크를 제안함으로써 비디오에서 인간의 행동을 식별하고자 한다. 제안 모델의 평가 결과 최대 90.7%정도의 정확도를 보였다.

플래시 액션스크립트 기반의 컴퓨터 시스템 구조 가상 학습실 개발 (Development of Virtual Classroom for Computer System Architecture based on the Flash ActionScript)

  • 서호준
    • 공학교육연구
    • /
    • 제7권4호
    • /
    • pp.16-21
    • /
    • 2004
  • 본 논문에서는 컴퓨터 내부 신호 전달 체계를 정확히 묘사할 수 있는 플래시 애니메이션과 교수자와 학습자 간의 상호 작용성이 뛰어난 플래시 프로그래밍 언어인 액션스크립트를 이용하여 가상학습실을 구축하였다. 제안한 가상학습실은 학습자 스스로 애니메이션을 조작하는 방식을 도입하여 사용자가 직접 입력한 키보드 또는 마우스 값에 따라 적절히 반응하는 비선형 애니메이션을 구현하여 자율적인 학습이 발생하도록 플래시 액션 스크립트 기반의 가상학습실을 구현하였다.

행동 인식을 위한 시공간 앙상블 기법 (Spatial-temporal Ensemble Method for Action Recognition)

  • 서민석;이상우;최동걸
    • 로봇학회논문지
    • /
    • 제15권4호
    • /
    • pp.385-391
    • /
    • 2020
  • As deep learning technology has been developed and applied to various fields, it is gradually changing from an existing single image based application to a video based application having a time base in order to recognize human behavior. However, unlike 2D CNN in a single image, 3D CNN in a video has a very high amount of computation and parameter increase due to the addition of a time axis, so improving accuracy in action recognition technology is more difficult than in a single image. To solve this problem, we investigate and analyze various techniques to improve performance in 3D CNN-based image recognition without additional training time and parameter increase. We propose a time base ensemble using the time axis that exists only in the videos and an ensemble in the input frame. We have achieved an accuracy improvement of up to 7.1% compared to the existing performance with a combination of techniques. It also revealed the trade-off relationship between computational and accuracy.

네트워크 공격 시뮬레이터를 이용한 강화학습 기반 사이버 공격 예측 연구 (A Study of Reinforcement Learning-based Cyber Attack Prediction using Network Attack Simulator (NASim))

  • 김범석;김정현;김민석
    • 반도체디스플레이기술학회지
    • /
    • 제22권3호
    • /
    • pp.112-118
    • /
    • 2023
  • As technology advances, the need for enhanced preparedness against cyber-attacks becomes an increasingly critical problem. Therefore, it is imperative to consider various circumstances and to prepare for cyber-attack strategic technology. This paper proposes a method to solve network security problems by applying reinforcement learning to cyber-security. In general, traditional static cyber-security methods have difficulty effectively responding to modern dynamic attack patterns. To address this, we implement cyber-attack scenarios such as 'Tiny Alpha' and 'Small Alpha' and evaluate the performance of various reinforcement learning methods using Network Attack Simulator, which is a cyber-attack simulation environment based on the gymnasium (formerly Open AI gym) interface. In addition, we experimented with different RL algorithms such as value-based methods (Q-Learning, Deep-Q-Network, and Double Deep-Q-Network) and policy-based methods (Actor-Critic). As a result, we observed that value-based methods with discrete action spaces consistently outperformed policy-based methods with continuous action spaces, demonstrating a performance difference ranging from a minimum of 20.9% to a maximum of 53.2%. This result shows that the scheme not only suggests opportunities for enhancing cybersecurity strategies, but also indicates potential applications in cyber-security education and system validation across a large number of domains such as military, government, and corporate sectors.

  • PDF