• Title/Summary/Keyword: Gaussian Mixture Models (GMM)

Search Result 41, Processing Time 0.027 seconds

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

  • Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

Study On The Robustness Of Face Authentication Methods Under illumination Changes (얼굴인증 방법들의 조명변화에 대한 견인성 비교 연구)

  • Ko Dae-Young;Kim Jin-Young;Na Seung-You
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.9-16
    • /
    • 2005
  • This paper focuses on the study of the face authentication system and the robustness of fact authentication methods under illumination changes. Four different face authentication methods are tried. These methods are as fellows; PCA(Principal Component Analysis), GMM(Gaussian Mixture Modeis), 1D HMM(1 Dimensional Hidden Markov Models), Pseudo 2D HMM(Pseudo 2 Dimensional Hidden Markov Models). Experiment results involving an artificial illumination change to fate images are compared with each other. Face feature vector extraction based on the 2D DCT(2 Dimensional Discrete Cosine Transform) if used. Experiments to evaluate the above four different fate authentication methods are carried out on the ORL(Olivetti Research Laboratory) face database. Experiment results show the EER(Equal Error Rate) performance degrade in ail occasions for the varying ${\delta}$. For the non illumination changes, Pseudo 2D HMM is $2.54{\%}$,1D HMM is $3.18{\%}$, PCA is $11.7{\%}$, GMM is $13.38{\%}$. The 1D HMM have the bettor performance than PCA where there is no illumination changes. But the 1D HMM have worse performance than PCA where there is large illumination changes(${\delta}{\geq}40$). For the Pseudo 2D HMM, The best EER performance is observed regardless of the illumination changes.

Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization (프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현)

  • 김민정;석수영;김광수;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.1
    • /
    • pp.8-14
    • /
    • 2002
  • In this paper, we realized a real-time text-independent speaker recognition system using gaussian mixture model, and applied frame level likelihood normalization method which shows its effects in verification system. The system has three parts as front-end, training, recognition. In front-end part, cepstral mean normalization and silence removal method were applied to consider speaker's speaking variations. In training, gaussian mixture model was used for speaker's acoustic feature modeling, and maximum likelihood estimation was used for GMM parameter optimization. In recognition, likelihood score was calculated with speaker models and test data at frame level. As test sentences, we used text-independent sentences. ETRI 445 and KLE 452 database were used for training and test, and cepstrum coefficient and regressive coefficient were used as feature parameters. The experiment results show that the frame-level likelihood method's recognition result is higher than conventional method's, independently the number of registered speakers.

  • PDF

Segmentation of Color Image using the Deterministic Annealing EM Algorithm (결정적 어닐링 EM 알고리즘을 이요한 칼라 영상의 분할)

  • Cho, Wan-Hyun;Park, Jong-Hyun;Park, Soon-Young
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.324-333
    • /
    • 2001
  • In this paper we present a novel color image segmentation algorithm based on a Gaussian Mixture Model(GMM). It is introduced a Deterministic Annealing Expectation Maximization(DAEM) algorithm which is developed using the principle of maximum entropy to overcome the local maxima problem associated with the standard EM algorithm. In our approach, the GMM is used to represent the multi-colored objects statistically and its parameters are estimated by DAEM algorithm. We also develop the automatic determination method of the number of components in Gaussian mixtures models. The segmentation of image is based on the maximum posterior probability distribution which is calculated by using the GMM. The experimental results show that the proposed DAEM can estimate the parameters more accurately than the standard EM and the determination method of the number of mixture models is very efficient. When tested on two natural images, the proposed algorithm performs much better than the traditional algorithm in segmenting the image fields.

  • PDF

Performance Improvement of EMG-Pattern Recognition Using MFCC-HMM-GMM (MFCC-HMM-GMM을 이용한 근전도(EMG)신호 패턴인식의 성능 개선)

  • Choi, Heung-Ho;Kim, Jung-Ho;Kwon, Jang-Woo
    • Journal of Biomedical Engineering Research
    • /
    • v.27 no.5
    • /
    • pp.237-244
    • /
    • 2006
  • This study proposes an approach to the performance improvement of EMG(Electromyogram) pattern recognition. MFCC(Mel-Frequency Cepstral Coefficients)'s approach is molded after the characteristics of the human hearing organ. While it supplies the most typical feature in frequency domain, it should be reorganized to detect the features in EMG signal. And the dynamic aspects of EMG are important for a task, such as a continuous prosthetic control or various time length EMG signal recognition, which have not been successfully mastered by the most approaches. Thus, this paper proposes reorganized MFCC and HMM-GMM, which is adaptable for the dynamic features of the signal. Moreover, it requires an analysis on the most suitable system setting fur EMG pattern recognition. To meet the requirement, this study balanced the recognition-rate against the error-rates produced by the various settings when loaming based on the EMG data for each motion.

Extracting Patterns of Airport Approach Using Gaussian Mixture Models and Analyzing the Overshoot Probabilities (가우시안 혼합모델을 이용한 공항 접근 패턴 추출 및 패턴 별 과이탈 확률 분석)

  • Jaeyoung Ryu;Seong-Min Han;Hak-Tae Lee
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.6
    • /
    • pp.888-896
    • /
    • 2023
  • When an aircraft is landing, it is expected that the aircraft will follow a specified approach procedure and then land at the airport. However, depending on the airport situation, neighbouring aircraft or the instructions of the air traffic controller, there can be a deviation from the specified approach. Detecting aircraft approach patterns is necessary for traffic flow and flight safety, and this paper suggests clustering techniques to identify aircraft patterns in the approach segment. The Gaussian Mixture Model (GMM), one of the machine learning techniques, is used to cluster the trajectories of aircraft, and ADS-B data from aircraft landing at the Gimhae airport in 2019 are used. The aircraft trajectories are clustered on the plane, and a total of 86 approach trajectory patterns are extracted using the centroid value of each cluster. Considering the correlation between the approach procedure pattern and overshoots, the distribution of overshoots is calculated.

Performance Improvement of a Text-Independent Speaker Identification System Using MCE Training (MCE 학습 알고리즘을 이용한 문장독립형 화자식별의 성능 개선)

  • Kim Tae-Jin;Choi Jae-Gil;Kwon Chul-Hong
    • MALSORI
    • /
    • no.57
    • /
    • pp.165-174
    • /
    • 2006
  • In this paper we use a training algorithm, MCE (Minimum Classification Error), to improve the performance of a text-independent speaker identification system. The MCE training scheme takes account of possible competing speaker hypotheses and tries to reduce the probability of incorrect hypotheses. Experiments performed on a small set speaker identification task show that the discriminant training method using MCE can reduce identification errors by up to 54% over a baseline system trained using Bayesian adaptation to derive GMM (Gaussian Mixture Models) speaker models from a UBM (Universal Background Model).

  • PDF

Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter (HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.66
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF

Performance Improvement in Observation Probability Computation of Gaussian Mixture Models Using GPGPU (GPGPU를 이용한 가우시안 혼합 모델의 관측확률 계산 성능 향상)

  • Kim, Hyeong-Ju;Kim, Seung-Hi;Kim, Sanghun;Jang, Gil-Jin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.148-151
    • /
    • 2012
  • 범용 GPU (general-purpose computing on graphics processing units, GPGPU)는 GPU를 일반적인 목적으로 사용하고자 하는 병렬 컴퓨터 구조로써, 과학 연산 등 여러 분야에서 응용 프로그램의 성능을 향상시키기 위하여 사용되고 있다. 본 연구에서는 음성인식기에서 주로 사용되는 가우시안 혼합 모델(Gaussian mixture model, GMM)에서 많은 연산시간을 차지하는 관측확률 계산의 성능을 향상시키고자 GPGPU를 이용하는 알고리즘을 구현하였으며, 기존 CPU 기반 알고리즘 대비 약 13배 연산시간을 단축하였다.

Segmentation of Color Image Using the Deterministic Anneanling EM Algorithm (결정적 어닐링 EM 알고리즘을 이용한 칼라 영상의 분할)

  • 박종현;박순영;조완현
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.569-572
    • /
    • 1999
  • In this paper we present a color image segmentation algorithm based on statistical models. A novel deterministic annealing Expectation Maximization(EM) formula is derived to estimate the parameters of the Gaussian Mixture Model(GMM) which represents the multi-colored objects statistically. The experimental results show that the proposed deterministic annealing EM is a global optimal solution for the ML parameter estimation and the image field is segmented efficiently by using the parameter estimates.

  • PDF