• Title/Summary/Keyword: Q-최적 실험

Search Result 86, Processing Time 0.029 seconds

Q-learning for Adaptive LQ Suboptimal Control of Discrete-time Switched Linear System (이산 시간 스위칭 선형 시스템의 적응 LQ 준최적 제어를 위한 Q-학습법)

  • Chun, Tae-Yoon;Choi, Yoon-Ho;Park, Jin-Bae
    • Proceedings of the KIEE Conference
    • /
    • 2011.07a
    • /
    • pp.1874-1875
    • /
    • 2011
  • 본 논문에서는 스위칭 선형 시스템의 적응 LQ 준최적 제어를 위한 Q-학습법 알고리즘을 제안한다. 제안된 제어 알고리즘은 안정성이 증명된 기존 Q-학습법에 기반하며 스위칭 시스템 모델의 변수를 모르는 상황에서도 준최적 제어가 가능하다. 이 알고리즘을 기반으로 기존에 스위칭 시스템에서 고려하지 않았던 각 시스템의 불확실성 및 최적 적응 제어 문제를 해결하고 컴퓨터 모의실험을 통해 제안한 알고리즘의 성능과 결과를 검증한다.

  • PDF

A Study of Design for Interior Permanent Magnet Synchronous Motor by using d-q Axis Equivalent Circuit Method (d-q축 등가회로 해석기법을 이용한 180 W급 IPMSM 설계에 관한 연구)

  • Kim, Young-Kyoun
    • Journal of the Korean Magnetics Society
    • /
    • v.27 no.2
    • /
    • pp.54-62
    • /
    • 2017
  • This paper presents a design of the Interior Permanent Magnet Synchronous Motor (IPMSM). an initial design process is accomplished by using the parametric design. In the design process, motor characteristics of parameters is computed by the d-q axis equivalent circuit model. Then, an optimal design process is accomplished by combination the experimental design and the response surface method. Finally, the design and analysis results are verified with experimental results.

Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning (강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현)

  • 박찬건;양성봉
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.672-680
    • /
    • 2003
  • A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.

Neural-Q method based on KFD regression (KFD 회귀를 이용한 뉴럴-큐 기법)

  • 조원희;김영일;박주영
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.85-88
    • /
    • 2003
  • 강화학습의 한가지 방법인 Q-learning은 최근에 Linear Quadratic Regulation(이하 LQR) 문제에 성공적으로 적용된 바 있다. 특히, 시스템 모델의 파라미터에 대한 구체적인 정보없이 적절한 입ㆍ출력만으로 학습을 통해 문제의 해결이 가능하므로 상황에 따라 매우 실용적인 방법이 될 수 있다. 뉴럴-큐 기법은 이러한 Q-learning의 Q-value를 MLP(multilayer perceptron) 신경망의 출력으로 대치시켜, 비선형 시스템의 최적제어 문제를 다룰 수 있게 한 방법이다. 그러나, 뉴럴-큐 기법은 신경망의 구조를 먼저 결정한 후 역전파 알고리즘을 이용해 학습하는 절차를 행하므로, 시행착오를 통해 신경망 구조를 결정해야 한다는 점, 역전파 알고리즘의 적용에 따라 신경망의 연결강도 값들이 지역적 최적해로 수렴한다는 점등의 문제점이 있다. 본 논문에서는 뉴럴-큐 학습의 도구로 KFD회귀를 이용하여 Q 함수의 근사 기법을 제안하고 관련 수식을 유도하였다. 그리고, 모의 실험을 통하여, 제안된 뉴럴-큐 방법의 적용 가능성을 알아보았다.

  • PDF

Neural -Q met,hod based on $\varepsilon$-SVR ($\varepsilon$-SVR을 이용한 Neural-Q 기법)

  • 조원희;김영일;박주영
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.12a
    • /
    • pp.162-165
    • /
    • 2002
  • Q-learning은 강화학습의 한 방법으로서, 여러 분야에 널리 응용되고 있는 기법이다. 최근에는 Linear Quadratic Regulation(이하 LQR) 문제에 성공적으로 적용된 바 있는데, 특히, 시스템모델의 파라미터에 대한 구체적인 정보가 없는 상태에서 적절한 입력과 출력만을 가지고 학습을 통해 문제를 해결할 수 있어서 상황에 따라서 매우 실용적인 대안이 될 수 있다. Neural Q-learning은 이러한 Q-learning의 Q-value를 MLP(multilayer perceptron) 신경망의 출력으로 대치시킴으로써, 비선형 시스템의 최적제어 문제를 다룰 수 있게 한 방법이다. 그러나, Neural Q방식은 신경망의 구조를 먼저 결정한 후 역전파 알고리즘을 이용하여 학습하는 절차를 취하기 때문에, 시행착오를 통하여 신경망 구조를 결정해야 한다는 점, 역전파 알고리즘의 적용으로 인해 신경망의 연결강도 값들이 지역적 최적해로 수렴한다는 점등의 문제점을 상속받는 한계가 있다. 따라서, 본 논문에서는 Neural-0 학습의 도구로, 역전파 알고리즘으로 학습되는 MLP 신경망을 사용하는 대신 최근 들어 여러 분야에서 그 성능을 인정받고 있는 서포트 벡터 학습법을 사용하는 방법을 택하여, $\varepsilon$-SVR(Epsilon Support Vector Regression)을 이용한 Q-value 근사 기법을 제안하고 관련 수식을 유도하였다. 그리고, 모의 실험을 통하여, 제안된 서포트 벡터학습 기반 Neural-Q 방법의 적용 가능성을 알아보았다.

Minimum Bias Design for Polynomial Regression (다항회귀모형에 대한 최소편의 실험계획)

  • Jang, Dae-Heung;Kim, Youngil
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1227-1234
    • /
    • 2015
  • Traditional criteria for optimum experimental designs depend on the specifications of the model; however, there will be a dilemma when we do not have perfect knowledge about the model. Box and Draper (1959) suggested one direction to minimize bias that may occur in this situation. We will demonstrate some examples with exact solutions that provide a no-bias design for polynomial regression. The most interesting finding is that a design that requires less bias should allocate design points away from the border of the design space.

Reinforcement Learning with Clustering for Function Approximation and Rule Extraction (함수근사와 규칙추출을 위한 클러스터링을 이용한 강화학습)

  • 이영아;홍석미;정태충
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1054-1061
    • /
    • 2003
  • Q-Learning, a representative algorithm of reinforcement learning, experiences repeatedly until estimation values about all state-action pairs of state space converge and achieve optimal policies. When the state space is high dimensional or continuous, complex reinforcement learning tasks involve very large state space and suffer from storing all individual state values in a single table. We introduce Q-Map that is new function approximation method to get classified policies. As an agent learns on-line, Q-Map groups states of similar situations and adapts to new experiences repeatedly. State-action pairs necessary for fine control are treated in the form of rule. As a result of experiment in maze environment and mountain car problem, we can achieve classified knowledge and extract easily rules from Q-Map

Experimental Evaluation of Levitation and Imbalance Compensation for the Magnetic Bearing System Using Discrete Time Q-Parameterization Control (이산시간 Q 매개변수화 제어를 이용한 자기축수 시스템에 대한 부상과 불평형보정의 실험적 평가)

  • ;Fumio Matsumura
    • Journal of KSNVE
    • /
    • v.8 no.5
    • /
    • pp.964-973
    • /
    • 1998
  • In this paper we propose a levitation and imbalance compensation controller design methodology of magnetic bearing system. In order to achieve levitation and elimination of unbalance vibartion in some operation speed we use the discrete-time Q-parameterization control. When rotor speed p = 0 there are no rotor unbalance, with frequency equals to the rotational speed. So in order to make levitatiom we choose the Q-parameterization controller free parameter Q such that the controller has poles on the unit circle at z = 1. However, when rotor speed p $\neq$ 0 there exist sinusoidal disturbance forces, with frequency equals to the rotational speed. So in order to achieve asymptotic rejection of these disturbance forces, the Q-parameterization controller free parameter Q is chosen such that the controller has poles on the unit circle at z = $exp^{ipTs}$ for a certain speed of rotation p ( $T_s$ is the sampling period). First, we introduce the experimental setup employed in this research. Second, we give a mathematical model for the magnetic bearing in difference equation form. Third, we explain the proposed discrete-time Q-parameterization controller design methodology. The controller free parameter Q is assumed to be a proper stable transfer function. Fourth, we show that the controller free parameter which satisfies the design objectives can be obtained by simply solving a set of linear equations rather than solving a complicated optimization problem. Finally, several simulation and experimental results are obtained to evaluate the proposed controller. The results obtained show the effectiveness of the proposed controller in eliminating the unbalance vibrations at the design speed of rotation.

  • PDF

Frequency and Position Dependences of Acoustically Driven Refrigerating Temperature Differences (음향구동 냉동 온도차의 주파수 및 위치 의존 특성)

  • 김용태;서상준;정성수;조문재
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.3-10
    • /
    • 1999
  • Investigations of temperature differences between both cnds of thermoacoustic exchanger generated by acoustic heat transport have been carried out as a function of the position of TAC(Thermo-Acoustic Couple)[1] in a 68-cm-long duct. Fixed with the electric power at 50W, measurements were compared with the theory changing the frequency from 150Hz to 300Hz with 10Hz step. The frequency-position dependent distribution of temperature difference corresponding to the Q-values was obtained with the numerical simulation. Through this distribution, the optimum position of the thermoacoustic exchanger and the optimum driving frequency can be determined.

  • PDF

A Dynamic Channel Assignment Method in Cellular Networks Using Reinforcement learning Method that Combines Supervised Knowledge (감독 지식을 융합하는 강화 학습 기법을 사용하는 셀룰러 네트워크에서 동적 채널 할당 기법)

  • Kim, Sung-Wan;Chang, Hyeong-Soo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.502-506
    • /
    • 2008
  • The recently proposed "Potential-based" reinforcement learning (RL) method made it possible to combine multiple learnings and expert advices as supervised knowledge within an RL framework. The effectiveness of the approach has been established by a theoretical convergence guarantee to an optimal policy. In this paper, the potential-based RL method is applied to a dynamic channel assignment (DCA) problem in a cellular networks. It is empirically shown that the potential-based RL assigns channels more efficiently than fixed channel assignment, Maxavail, and Q-learning-based DCA, and it converges to an optimal policy more rapidly than other RL algorithms, SARSA(0) and PRQ-learning.