Search | Korea Science

Implementation of the Agent using Universal On-line Q-learning by Balancing Exploration and Exploitation in Reinforcement Learning (강화 학습에서의 탐색과 이용의 균형을 통한 범용적 온라인 Q-학습이 적용된 에이전트의 구현)

박찬건;양성봉
- Journal of KIISE:Software and Applications
- /
- v.30 no.7_8
- /
- pp.672-680
- /
- 2003
A shopbot is a software agent whose goal is to maximize buyer´s satisfaction through automatically gathering the price and quality information of goods as well as the services from on-line sellers. In the response to shopbots´ activities, sellers on the Internet need the agents called pricebots that can help them maximize their own profits. In this paper we adopts Q-learning, one of the model-free reinforcement learning methods as a price-setting algorithm of pricebots. A Q-learned agent increases profitability and eliminates the cyclic price wars when compared with the agents using the myoptimal (myopically optimal) pricing strategy Q-teaming needs to select a sequence of state-action fairs for the convergence of Q-teaming. When the uniform random method in selecting state-action pairs is used, the number of accesses to the Q-tables to obtain the optimal Q-values is quite large. Therefore, it is not appropriate for universal on-line learning in a real world environment. This phenomenon occurs because the uniform random selection reflects the uncertainty of exploitation for the optimal policy. In this paper, we propose a Mixed Nonstationary Policy (MNP), which consists of both the auxiliary Markov process and the original Markov process. MNP tries to keep balance of exploration and exploitation in reinforcement learning. Our experiment results show that the Q-learning agent using MNP converges to the optimal Q-values about 2.6 time faster than the uniform random selection on the average.
PDF KSCI

Off-line recognition of handwritten korean and alphanumeric characters using hidden markov models (Hidden Markov Model을 이용한 필기체 한글 및 영.숫자 오프라인 인식)

김우성;박래홍
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.31B no.9
- /
- pp.85-100
- /
- 1994
This paper proposes a recognition system of constrained handwritten Hangul and alphanumeric characters using discrete hidden Markov models (HMM). HMM process encodes the distortion and similarity among patterns of a class through a doubly stochastic approach. Characterizing the statistical properties of characters using selected features, a recognition system can be implemented by absorbing possible variations in the form. Hangul shapes are classified into six types by fuzzy inference, and their recognition is performed based on quantized features by optimally ordering features according to their effectiveness in each class. The constrained alphanumerics recognition is also performed using the same features used in Hangul recognition. The forward-backward, Viterbi, and Baum-Welch reestimation algorithms are used for training and recognition of handwritten Hangul and alphanumeric characters. Simulation result shows that the proposed method recognizes handwritten Korean characters and alphanumerics effectively.
PDF

Gait State Classification by HMMS for Pedestrian Inertial Navigation System (보행용 관성 항법 시스템을 위한 HMMS를 통한 걸음 단계 구분)

Park, Sang-Kyeong;Suh, Young-Soo
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.58 no.5
- /
- pp.1010-1018
- /
- 2009
An inertial navigation system for pedestrian position tracking is proposed, where the position is computed using inertial sensors mounted on shoes. Inertial navigation system(INS) errors increase with time due to inertial sensor errors, and therefore it needs to reset errors frequently. During normal walking, there is an almost periodic zero velocity instance when a foot touches the floor. Using this fact, estimation errors are reduced and this method is called the zero velocity updating algorithm. When implementing this zero velocity updating algorithm, it is important to know when is the zero velocity interval. The gait states are modeled as a Markov process and each state is estimated using the hidden Markov model smoother. With this gait estimation, the zero or nearly zero velocity interval is more accurately estimated, which helps to reduce the position estimation error.
PDF KSCI

Viscoplasticity model stochastic parameter identification: Multi-scale approach and Bayesian inference

Nguyen, Cong-Uy;Hoang, Truong-Vinh;Hadzalic, Emina;Dobrilla, Simona;Matthies, Hermann G.;Ibrahimbegovic, Adnan
- Coupled systems mechanics
- /
- v.11 no.5
- /
- pp.411-438
- /
- 2022
In this paper, we present the parameter identification for inelastic and multi-scale problems. First, the theoretical background of several fundamental methods used in the upscaling process is reviewed. Several key definitions including random field, Bayesian theorem, Polynomial chaos expansion (PCE), and Gauss-Markov-Kalman filter are briefly summarized. An illustrative example is given to assimilate fracture energy in a simple inelastic problem with linear hardening and softening phases. Second, the parameter identification using the Gauss-Markov-Kalman filter is employed for a multi-scale problem to identify bulk and shear moduli and other material properties in a macro-scale with the data from a micro-scale as quantities of interest (QoI). The problem can also be viewed as upscaling homogenization.
https://doi.org/10.12989/csm.2022.11.5.411 인용 KSCI

A NON-MARKOVIAN EVOLUTION MODEL OF HIV POPULATION WITH BUNCHING BEHAVIOUR

Sridharan, V.;Jayshree, P.R.
- Journal of applied mathematics & informatics
- /
- v.5 no.3
- /
- pp.785-796
- /
- 1998
In this paper we propose a model of HIv population through method of phases with non-Markovian evolution of immi-gration. The analysis leads to an explicit differnetial equations for the generating functions of the total population size. The detection process of antibodies (against the antigen of virus) is analysed and an explicit expression for the correlation functions are provided. A measure of bunching is also introduced for some particular choice of parameters.

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

Chung, Yong-Joo
- Speech Sciences
- /
- v.15 no.2
- /
- pp.55-65
- /
- 2008
Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.
PDF

Simulating phase transition phenomena of the unitary cell model

Kim, Dong-Hoh
- Journal of the Korean Data and Information Science Society
- /
- v.20 no.1
- /
- pp.225-235
- /
- 2009
Lattice process models are used to explain phase transitions in statistical mechanics, a branch of physics. The Ising model, a specific form of lattice process model, was proposed by Ising in 1925. Since then, variants of the Ising model such as the Potts model and the unitary cell model have been proposed. Like the Ising model, it is believed that the more general models exhibit phase transitions on the critical surface, which is based on the mathematical equation. In statistical sense, phase transitions can be simulated through Markov Chain Monte Carlo (MCMC). We applied Swendsen-Wang algorithm, a block Gibbs algorithm, to a general lattice process models and we simulate phase transition phenomena of the unitary cell model.
PDF

Measuring Unemployment Durations of Different Types of Workers (실업지속기간의 측정모형)

Choi, Chang-Kon
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.4
- /
- pp.1603-1608
- /
- 2012
This paper aims to build a model of unemployment duration, in which each type of unemployment duration can be defined as a function of other exogenous variables. Recently, the so-called mismatch in the labor market has become a big issue in most countries. It is very obvious that 'mismatch' is deeply related to the long duration of unemployment status. Two problems may be head and tail of the same coin. Employing a simple analysis of Markov stochastic process, the model of unemployment duration developed here is useful for seeing the effects of shocks on unemployment duration. The model allows us to distinguish the determinants of different kinds of unemployment and to identify the nature of unemployment duration.
https://doi.org/10.5762/KAIS.2012.13.4.1603 인용 PDF KSCI

A Nuclide Transport Model in the Fractured Rock Medium Using a Continuous Time Markov Process (연속시간 마코프 프로세스를 이용한 균열암반매질에서의 핵종이동 모델)

Lee, Y.M.;Kang, C.H.;Hahn, P.S.;Park, H.H.;Lee, K.J.
- Nuclear Engineering and Technology
- /
- v.25 no.4
- /
- pp.529-538
- /
- 1993
A stochastic way using continuous time Markov process is presented to model the one-dimensional nuclide transport in fractured rock matrix as an extended study for previous work ［1］. A nuclide migration model by the continuous time Markov process for single planar fractured rock matrix, which is considered as a transient system where a process by which the nuclide is diffused into the rock matrix from the fracture may be no more time homogeneous, is compared with a conventional deterministic analytical solution. The primary desired quantities from a stochastic model are the expected values and variance of the state variables as a function of time. The time-dependent probability distributions of nuclides are presented for each discretized compartment of the medium given intensities of transition. Since this model is discrete in medium space, parameters which affect nuclide transport could be easily incorporated for such heterogeneous media as the fractured rock matrix and the layered porous media. Even though the model developed in this study was shown to be sensitive to the number of discretized compartment showing numerical dispersion as the number of compartments are decreased, with small compensating of dispersion coefficient, the model agrees well to analytical solution.
PDF

Optimal Network Defense Strategy Selection Based on Markov Bayesian Game

Wang, Zengguang;Lu, Yu;Li, Xi;Nie, Wei
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.11
- /
- pp.5631-5652
- /
- 2019
The existing defense strategy selection methods based on game theory basically select the optimal defense strategy in the form of mixed strategy. However, it is hard for network managers to understand and implement the defense strategy in this way. To address this problem, we constructed the incomplete information stochastic game model for the dynamic analysis to predict multi-stage attack-defense process by combining Bayesian game theory and the Markov decision-making method. In addition, the payoffs are quantified from the impact value of attack-defense actions. Based on previous statements, we designed an optimal defense strategy selection method. The optimal defense strategy is selected, which regards defense effectiveness as the criterion. The proposed method is feasibly verified via a representative experiment. Compared to the classical strategy selection methods based on the game theory, the proposed method can select the optimal strategy of the multi-stage attack-defense process in the form of pure strategy, which has been proved more operable than the compared ones.
https://doi.org/10.3837/tiis.2019.11.020 인용 PDF KSCI HTML

Search Result 368, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)