• Title/Summary/Keyword: Discrete HMM

Search Result 61, Processing Time 0.024 seconds

A study on the lip shape recognition algorithm using 3-D Model (3차원 모델을 이용한 입모양 인식 알고리즘에 관한 연구)

  • 남기환;배철수
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.5
    • /
    • pp.783-788
    • /
    • 2002
  • Recently, research and developmental direction of communication system is concurrent adopting voice data and face image in speaking to provide more higher recognition rate then in the case of only voice data. Therefore, we present a method of lipreading in speech image sequence by using the 3-D facial shape model. The method use a feature information of the face image such as the opening-level of lip, the movement of jaw, and the projection height of lip. At first, we adjust the 3-D face model to speeching face Image sequence. Then, to get a feature information we compute variance quantity from adjusted 3-D shape model of image sequence and use the variance quality of the adjusted 3-D model as recognition parameters. We use the intensity inclination values which obtaining from the variance in 3-D feature points as the separation of recognition units from the sequential image. After then, we use discrete HMM algorithm at recognition process, depending on multiple observation sequence which considers the variance of 3-D feature point fully. As a result of recognition experiment with the 8 Korean vowels and 2 Korean consonants, we have about 80% of recognition rate for the plosives md vowels.

Detection of Gradual Transitions in MPEG Compressed Video using Hidden Markov Model (은닉 마르코프 모델을 이용한 MPEG 압축 비디오에서의 점진적 변환의 검출)

  • Choi, Sung-Min;Kim, Dai-Jin;Bang, Sung-Yang
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.3
    • /
    • pp.379-386
    • /
    • 2004
  • Video segmentation is a fundamental task in video indexing and it includes two kinds of shot change detections such as the abrupt transition and the gradual transition. The abrupt shot boundaries are detected by computing the image-based distance between adjacent frames and comparing this distance with a pre-determined threshold value. However, the gradual shot boundaries are difficult to detect with this approach. To overcome this difficulty, we propose the method that detects gradual transition in the MPEG compressed video using the HMM (Hidden Markov Model). We take two different HMMs such as a discrete HMM and a continuous HMM with a Gaussian mixture model. As image features for HMM's observations, we use two distinct features such as the difference of histogram of DC images between two adjacent frames and the difference of each individual macroblock's deviations at the corresponding macroblock's between two adjacent frames, where deviation means an arithmetic difference of each macroblock's DC value from the mean of DC values in the given frame. Furthermore, we obtain the DC sequences of P and B frame by the first order approximation for a fast and effective computation. Experiment results show that we obtain the best detection and classification performance of gradual transitions when a continuous HMM with one Gaussian model is taken and two image features are used together.

A Discrete Feature Vector for Endpoint Detection of Speech with Hidden Markov Model (숨은마코프모형을 이용하는 음성 끝점 검출을 위한 이산 특징벡터)

  • Lee, Jei-Ky;Oh, Chang-Hyuck
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.959-967
    • /
    • 2008
  • The purpose of this paper is to suggest a discrete feature vector, robust in various levels of noisy environment and inexpensive in computation, for detection of speech segments and is to show such properties of the feature with real speech data. The suggested feature is one dimensional vector which represents slope of short term energies and is discretized into three values to reduce computational burden of computations in HMM. In experiments with speech data, the method with the suggested feature vector showed good performance even in noisy environments.

New Postprocessing Methods for Rejectin Out-of-Vocabulary Words

  • Song, Myung-Gyu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3E
    • /
    • pp.19-23
    • /
    • 1997
  • The goal of postprocessing in automatic speech recognition is to improve recognition performance by utterance verification at the output of recognition stage. It is focused on the effective rejection of out-of vocabulary words based on the confidence score of hypothesized candidate word. We present two methods for computing confidence scores. Both methods are based on the distance between each observation vector and the representative code vector, which is defined by the most likely code vector at each state. While the first method employs simple time normalization, the second one uses a normalization technique based on the concept of on-line garbage mode[1]. According to the speaker independent isolated words recognition experiment with discrete density HMM, the second method outperforms both the first one and conventional likelihood ratio scoring method[2].

  • PDF

Recognition of Emotion and Emotional Speech Based on Prosodic Processing

  • Kim, Sung-Ill
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3E
    • /
    • pp.85-90
    • /
    • 2004
  • This paper presents two kinds of new approaches, one of which is concerned with recognition of emotional speech such as anger, happiness, normal, sadness, or surprise. The other is concerned with emotion recognition in speech. For the proposed speech recognition system handling human speech with emotional states, total nine kinds of prosodic features were first extracted and then given to prosodic identifier. In evaluation, the recognition results on emotional speech showed that the rates using proposed method increased more greatly than the existing speech recognizer. For recognition of emotion, on the other hands, four kinds of prosodic parameters such as pitch, energy, and their derivatives were proposed, that were then trained by discrete duration continuous hidden Markov models(DDCHMM) for recognition. In this approach, the emotional models were adapted by specific speaker's speech, using maximum a posteriori(MAP) estimation. In evaluation, the recognition results on emotional states showed that the rates on the vocal emotions gradually increased with an increase of adaptation sample number.

Study On the Robustness Of Four Different Face Authentication Methods Under Illumination Changes (얼굴인증 방법들의 조명변화에 대한 견인성 연구)

  • 고대영;천영하;김진영;이주헌
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2036-2039
    • /
    • 2003
  • This paper focuses on the study of the robustness of face authentication methods under illumination changes. Four different face authentication methods are tried. These methods are as follows; Principal Component Analysis, Gaussian Mixture Models, 1-Dimensional Hidden Markov Models, 2-Dimensional Hidden Markov Models. Experiment results involving an artificial illumination change to face images are compared with each others. Face feature vector extraction method based on the 2-Dimensional Discrete Cosine Transform is used. Experiments to evaluate the above four different face authentication methods are carried out on the Olivetti Research Laboratory(ORL) face database. For the pseudo 2D HMM, the best EER (Equal Error Rate) performance is observed.

  • PDF

A Korean Flight Reservation System Using Continuous Speech Recognition

  • Choi, Jong-Ryong;Kim, Bum-Koog;Chung, Hyun-Yeol;Nakagawa, Seiichi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.60-65
    • /
    • 1996
  • This paper describes on the Korean continuous speech recognition system for flight reservation. It adopts a frame-synchronous One-Pass DP search algorithm driven by syntactic constraints of context free grammar(CFG). For recognition, 48 phoneme-like units(PLU) were defined and used as basic units for acoustic modeling of Korean. This modeling was conducted using a HMM technique, where each model has 4-states 3-continuous output probability distributions and 3-discrete-duration distributions. Language modeling by CFG was also applied to the task domain of flight reservation, which consisted of 346 words and 422 rewriting rules. In the tests, the sentence recognition rate of 62.6% was obtained after speaker adaptation.

  • PDF

Online korean character recognition using letter spotting method (자소 탐색 방법에 의한 온라인 한글 필기 인식)

  • 조범준
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.6
    • /
    • pp.1379-1389
    • /
    • 1996
  • Hangul character always consists of consonants-vowel-consonants in order. Using this point, this paper proposes an approach to design a model for spotting each letter in Hangul, and then recognize characters based on the spotting results. The network model consist of a set of HMMs. The letter search is carried out by Viterbi algorithm, while character recognition is performed by searching the lattice of letter hypotheses. Experimental results show that, in spite of simple architecture of recognition, the performance is quite high reaching 87.47% for discrete regular characters. In particular the approach shows highly plausible segmentation of letters in characters.

  • PDF

Speech Recognition in Noisy Environments using Wiener Filtering (Wiener Filtering을 이용한 잡음환경에서의 음성인식)

  • Kim, Jin-Young;Eom, Ki-Wan;Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.277-283
    • /
    • 1997
  • In this paper, we present a robust recognition algorithm based on the Wiener filtering method as a research tool to develop the Korean Speech recognition system. We especially used Wiener filtering method in cepstrum-domain, because the method in frequency-domain is computationally expensive and complex. Evaluation of the effectiveness of this method has been conducted in speaker-independent isolated Korean digit recognition tasks using discrete HMM speech recognition systems. In these tasks, we used 12th order weighted cepstral as a feature vector and added computer simulated white gaussian noise of different levels to clean speech signals for recognition experiments under noisy conditions. Experimental results show that the presented algorithm can provide an improvement in recognition of as much as from $5\%\;to\;\20\%$ in comparison to spectral subtraction method.

  • PDF

A study on the lip shape recognition algorithm using 3-D Model (3차원 모델을 이용한 입모양 인식 알고리즘에 관한 연구)

  • 김동수;남기환;한준희;배철수;나상동
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1998.11a
    • /
    • pp.181-185
    • /
    • 1998
  • Recently, research and developmental direction of communication system is concurrent adopting voice data and face image in speaking to provide more higher recognition rate then in the case of only voice data. Therefore, we present a method of lipreading in speech image sequence by using the 3-D facial shape model. The method use a feature information of the face image such as the opening-level of lip, the movement of jaw, and the projection height of lip. At first, we adjust the 3-D face model to speeching face image sequence. Then, to get a feature information we compute variance quantity from adjusted 3-D shape model of image sequence and use the variance quality of the adjusted 3-D model as recognition parameters. We use the intensity inclination values which obtaining from the variance in 3-D feature points as the separation of recognition units from the sequential image. After then, we use discrete HMM algorithm at recognition process, depending on multiple observation sequence which considers the variance of 3-D feature point fully. As a result of recognition experiment with the 8 Korean vowels and 2 Korean consonants, we have about 80% of recognition rate for the plosives and vowels.

  • PDF