Search | Korea Science

A Function Approximation Method for Q-learning of Reinforcement Learning (강화학습의 Q-learning을 위한 함수근사 방법)

이영아;정태충
- Journal of KIISE:Software and Applications
- /
- v.31 no.11
- /
- pp.1431-1438
- /
- 2004
Reinforcement learning learns policies for accomplishing a task's goal by experience through interaction between agent and environment. Q-learning, basis algorithm of reinforcement learning, has the problem of curse of dimensionality and slow learning speed in the incipient stage of learning. In order to solve the problems of Q-learning, new function approximation methods suitable for reinforcement learning should be studied. In this paper, to improve these problems, we suggest Fuzzy Q-Map algorithm that is based on online fuzzy clustering. Fuzzy Q-Map is a function approximation method suitable to reinforcement learning that can do on-line teaming and express uncertainty of environment. We made an experiment on the mountain car problem with fuzzy Q-Map, and its results show that learning speed is accelerated in the incipient stage of learning.
PDF KSCI

Protection for DFIG using the d-q Equivalent Circuit (d-q 등가회로를 이용한 이중여자 유도발전기 보호)

Kang, Yong-Cheol;Lee, Ji-Hoon;Kang, Hae-Gweon;Jang, Sung-Il;Kim, Yong-Gyun;Park, Goon-Cherl
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.57 no.12
- /
- pp.2173-2178
- /
- 2008
A doubly-fed induction generator(DFIG) system has been widely used in the modem wind turbines due to variable-speed operation, high efficiency and small converter size. It is well known that an inter-turn fault of a generator is very difficult to be detected. The DFIG system uses a wound rotor induction machine so that the magnetizing current of the generator can be fed from both the stator and the rotor. This paper proposes a protection algorithm for a DFIG using the d-q equivalent circuit in the time domain. In the case of a DFIG, the voltages and currents of the rotor side as well as the voltages and currents of the stator are available. The proposed algorithm estimates the instantaneous(i.e., converted into the stationary frame) induced voltages from the rotor and the stator sides. If the difference between the two estimated induced voltages exceeds the threshold, the proposed algorithm detects the inter-turn fault. The algorithm can detect a inter-turn fault of a winding. The performance of the proposed algorithm is validated using a PSCAD/EMTDC simulator under inter-turn fault conditions and normal operating conditions such as an external fault and the change of the wind speed.
PDF KSCI

Design of an Improved Anti-Collision Unit for an RFID Reader System Based on Gen2 (Gen2 리더 시스템의 개선된 충돌방지 유닛 설계)

Sim, Jae-Hee;Lee, Yong-Joo;Lee, Yong-Surk
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.34 no.2A
- /
- pp.177-183
- /
- 2009
In this paper, we propose an improved anti-collision algorithm. We have designed an anti-collision unit using this algorithm for the 18000-6 Type C Class 1 Generation 2 standard (Gen2). The Gen2 standard uses a Q-algorithm for incremental method on the Dynamic Slot-Aloha algorithm. It has basically enhanced performance over the Slot-Aloha algorithm. Unfortunately, there are several non-clarified parts: initial $Q_{fp}$ value, weighted C, and the ending point of the algorithm. If an incorrect value is selected, it causes degradation in performance. Thus we propose an improved anti-collision algorithm by clearly defining the vague parts of the existing algorithm. Simulation results showed an improved performance of up to 34.8% using an optimized value of C and the initial $Q_{fp}$ value. With the ending condition, performance is 34.7%. The anti-collision unit is designed using the Verilog HDL. The module was synthesized using Synopsys' Design Compiler and the TSMC $0.2{\mu}m$ standard cell library. The synthesized result yielded 3,847 gates, and was guaranteed under the proposed working frequency of 19.2MHz.
PDF KSCI

ON POSITIVE DEFINITE SOLUTIONS OF A CLASS OF NONLINEAR MATRIX EQUATION

Fang, Liang;Liu, San-Yang;Yin, Xiao-Yan
- Bulletin of the Korean Mathematical Society
- /
- v.55 no.2
- /
- pp.431-448
- /
- 2018
This paper is concerned with the positive definite solutions of the nonlinear matrix equation $X-A^*{\bar{X}}^{-1}A=Q$, where A, Q are given complex matrices with Q positive definite. We show that such a matrix equation always has a unique positive definite solution and if A is nonsingular, it also has a unique negative definite solution. Moreover, based on Sherman-Morrison-Woodbury formula, we derive elegant relationships between solutions of $X-A^*{\bar{X}}^{-1}A=I$ and the well-studied standard nonlinear matrix equation $Y+B^*Y^{-1}B=Q$, where B, Q are uniquely determined by A. Then several effective numerical algorithms for the unique positive definite solution of $X-A^*{\bar{X}}^{-1}A=Q$ with linear or quadratic convergence rate such as inverse-free fixed-point iteration, structure-preserving doubling algorithm, Newton algorithm are proposed. Numerical examples are presented to illustrate the effectiveness of all the theoretical results and the behavior of the considered algorithms.
https://doi.org/10.4134/BKMS.b170054 인용 PDF KSCI

INNOVATION ALGORITHM IN ARMA PROCESS

Sreenivasan, M.;Sumathi, K.
- Journal of applied mathematics & informatics
- /
- v.5 no.2
- /
- pp.373-382
- /
- 1998
Most of the works in Time Series Analysis are based on the Auto Regressive Integrated Moving Average (ARIMA) models presented by Box and Jeckins(1976). If the data exhibits no ap-parent deviation from stationarity and if it has rapidly decreasing autocorrelation function then a suitable ARIMA(p,q) model is fit to the given data. Selection of the orders of p and q is one of the crucial steps in Time Series Analysis. Most of the methods to determine p and q are based on the autocorrelation function and partial autocor-relation function as suggested by Box and Jenkins (1976). many new techniques have emerged in the literature and it is found that most of them are over very little use in determining the orders of p and q when both of them are non-zero. The Durbin-Levinson algorithm and Innovation algorithm (Brockwell and Davis 1987) are used as recur-sive methods for computing best linear predictors in an ARMA(p,q)model. These algorithms are modified to yield an effective method for ARMA model identification so that the values of order p and q can be determined from them. The new method is developed and its validity and usefulness is illustrated by many theoretical examples. This method can also be applied to an real world data.

An Adaptive Scheduling Algorithm for Manufacturing Process with Non-stationary Rework Probabilities (비안정적인 Rework 확률이 존재하는 제조공정을 위한 적응형 스케줄링 알고리즘)

Shin, Hyun-Joon;Ru, Jae-Pil
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.11 no.11
- /
- pp.4174-4181
- /
- 2010
This paper presents an adaptive scheduling algorithm for manufacturing processes with non-stationary rework probabilities. The adaptive scheduling scheme named by hybrid Q-learning algorithm is proposed in this paper making use of the non-stationary rework probability and coupling with artificial neural networks. The proposed algorithm is measured by mean tardiness and the extensive computational results show that the presented algorithm gives very efficient schedules superior to the existing dispatching algorithms.
https://doi.org/10.5762/KAIS.2010.11.11.4174 인용 PDF KSCI

Object Tracking Algorithm of Swarm Robot System for using Polygon Based Q-Learning and Cascade SVM (다각형 기반의 Q-Learning과 Cascade SVM을 이용한 군집로봇의 목표물 추적 알고리즘)

Seo, Sang-Wook;Yang, Hyung-Chang;Sim, Kwee-Bo
- IEMEK Journal of Embedded Systems and Applications
- /
- v.3 no.2
- /
- pp.119-125
- /
- 2008
This paper presents the polygon-based Q-leaning and Cascade Support Vector Machine algorithm for object search with multiple robots. We organized an experimental environment with ten mobile robots, twenty five obstacles, and an object, and then we sent the robots to a hallway, where some obstacles were lying about, to search for a hidden object. In experiment, we used four different control methods: a random search, a fusion model with Distance-based action making (DBAM) and Area-based action making (ABAM) process to determine the next action of the robots, and hexagon-based Q-learning and dodecagon-based Q-learning and Cascade SVM to enhance the fusion model with DBAM and ABAM process.
PDF

Max-Mean N-step Temporal-Difference Learning Using Multi-Step Return (멀티-스텝 누적 보상을 활용한 Max-Mean N-Step 시간차 학습)

Hwang, Gyu-Young;Kim, Ju-Bong;Heo, Joo-Seong;Han, Youn-Hee
- KIPS Transactions on Computer and Communication Systems
- /
- v.10 no.5
- /
- pp.155-162
- /
- 2021
n-step TD learning is a combination of Monte Carlo method and one-step TD learning. If appropriate n is selected, n-step TD learning is known as an algorithm that performs better than Monte Carlo method and 1-step TD learning, but it is difficult to select the best values of n. In order to solve the difficulty of selecting the values of n in n-step TD learning, in this paper, using the characteristic that overestimation of Q can improve the performance of initial learning and that all n-step returns have similar values for Q ≈ Q^*, we propose a new learning target, which is composed of the maximum and the mean of all k-step returns for 1 ≤ k ≤ n. Finally, in OpenAI Gym's Atari game environment, we compare the proposed algorithm with n-step TD learning and proved that the proposed algorithm is superior to n-step TD learning algorithm.
https://doi.org/10.3745/KTCCS.2021.10.5.155 인용 PDF KSCI

A DUAL ALGORITHM FOR MINIMAX PROBLEMS

HE SUXIANG
- Journal of applied mathematics & informatics
- /
- v.17 no.1_2_3
- /
- pp.401-418
- /
- 2005
In this paper, a dual algorithm, based on a smoothing function of Bertsekas (1982), is established for solving unconstrained minimax problems. It is proven that a sequence of points, generated by solving a sequence of unconstrained minimizers of the smoothing function with changing parameter t, converges with Q-superlinear rate to a Kuhn-Thcker point locally under some mild conditions. The relationship between the condition number of the Hessian matrix of the smoothing function and the parameter is studied, which also validates the convergence theory. Finally the numerical results are reported to show the effectiveness of this algorithm.

EPCglobal Class-1 Gen-2 Q-Algorithm with Tag Number Estimation (태그 수 추정을 이용한 EPCglobal Class-1 Gen-2 Q-알고리즘)

Lim, Intaek
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2016.10a
- /
- pp.723-725
- /
- 2016
In Gen-2 Q-algorithm, if the number of tags is small and we let the initial $Q_{fp}$ be large, the number of empty slot will be large. On the other hand, if we let the initial $Q_{fp}$ be small in spite of many tags, almost all the slots will be collided. Also, if the reader selects an inappropriate weight, there are a lot of empty or collided slots. As a result, the performance will be declined because the frame size does not converge to the optimal point quickly during the query round. In this paper, we propose a scheme to select the weight based on the slot-count size of current query round through the tag number estimation and.
PDF

Search Result 690, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)