• Title/Summary/Keyword: Gaussian mixture models

Search Result 99, Processing Time 0.029 seconds

Segmentation of Color Image Using the Deterministic Anneanling EM Algorithm (결정적 어닐링 EM 알고리즘을 이용한 칼라 영상의 분할)

  • 박종현;박순영;조완현
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.569-572
    • /
    • 1999
  • In this paper we present a color image segmentation algorithm based on statistical models. A novel deterministic annealing Expectation Maximization(EM) formula is derived to estimate the parameters of the Gaussian Mixture Model(GMM) which represents the multi-colored objects statistically. The experimental results show that the proposed deterministic annealing EM is a global optimal solution for the ML parameter estimation and the image field is segmented efficiently by using the parameter estimates.

  • PDF

Research about auto-segmentation via SVM (SVM을 이용한 자동 음소분할에 관한 연구)

  • 권호민;한학용;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2220-2223
    • /
    • 2003
  • In this paper we used Support Vector Machines(SVMs) recently proposed as the loaming method, one of Artificial Neural Network, to divide continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. From experiment we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).

  • PDF

A Fast EM Algorithm for Gaussian Mixtures

  • Jung, Hye-Kyung;Seo, Byung-Tae
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.1
    • /
    • pp.157-168
    • /
    • 2012
  • The EM algorithm is the most important tool to obtain the maximum likelihood estimator in finite mixture models due to its stability and simplicity. However, its convergence rate is often slow because the conventional EM algorithm is based on a large missing data space. Several techniques have been proposed in the literature to reduce the missing data space. In this paper, we review existing methods and propose a new EM algorithm for Gaussian mixtures, which reduces the missing data space while preserving the stability of the conventional EM algorithm. The performance of the proposed method is evaluated with other existing methods via simulation studies.

PCMM-Based Feature Compensation Method Using Multiple Model to Cope with Time-Varying Noise (시변 잡음에 대처하기 위한 다중 모델을 이용한 PCMM 기반 특징 보상 기법)

  • 김우일;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.473-480
    • /
    • 2004
  • In this paper we propose an effective feature compensation scheme based on the speech model in order to achieve robust speech recognition. The proposed feature compensation method is based on parallel combined mixture model (PCMM). The previous PCMM works require a highly sophisticated procedure for estimation of the combined mixture model in order to reflect the time-varying noisy conditions at every utterance. The proposed schemes can cope with the time-varying background noise by employing the interpolation method of the multiple mixture models. We apply the‘data-driven’method to PCMM tot move reliable model combination and introduce a frame-synched version for estimation of environments posteriori. In order to reduce the computational complexity due to multiple models, we propose a technique for mixture sharing. The statistically similar Gaussian components are selected and the smoothed versions are generated for sharing. The performance is examined over Aurora 2.0 and speech corpus recorded while car-driving. The experimental results indicate that the proposed schemes are effective in realizing robust speech recognition and reducing the computational complexities under both simulated environments and real-life conditions.

Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization (프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현)

  • 김민정;석수영;김광수;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.1
    • /
    • pp.8-14
    • /
    • 2002
  • In this paper, we realized a real-time text-independent speaker recognition system using gaussian mixture model, and applied frame level likelihood normalization method which shows its effects in verification system. The system has three parts as front-end, training, recognition. In front-end part, cepstral mean normalization and silence removal method were applied to consider speaker's speaking variations. In training, gaussian mixture model was used for speaker's acoustic feature modeling, and maximum likelihood estimation was used for GMM parameter optimization. In recognition, likelihood score was calculated with speaker models and test data at frame level. As test sentences, we used text-independent sentences. ETRI 445 and KLE 452 database were used for training and test, and cepstrum coefficient and regressive coefficient were used as feature parameters. The experiment results show that the frame-level likelihood method's recognition result is higher than conventional method's, independently the number of registered speakers.

  • PDF

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser;Almasganj, Farshad
    • ETRI Journal
    • /
    • v.35 no.1
    • /
    • pp.100-108
    • /
    • 2013
  • In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

Study On the Robustness Of Four Different Face Authentication Methods Under Illumination Changes (얼굴인증 방법들의 조명변화에 대한 견인성 연구)

  • 고대영;천영하;김진영;이주헌
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2036-2039
    • /
    • 2003
  • This paper focuses on the study of the robustness of face authentication methods under illumination changes. Four different face authentication methods are tried. These methods are as follows; Principal Component Analysis, Gaussian Mixture Models, 1-Dimensional Hidden Markov Models, 2-Dimensional Hidden Markov Models. Experiment results involving an artificial illumination change to face images are compared with each others. Face feature vector extraction method based on the 2-Dimensional Discrete Cosine Transform is used. Experiments to evaluate the above four different face authentication methods are carried out on the Olivetti Research Laboratory(ORL) face database. For the pseudo 2D HMM, the best EER (Equal Error Rate) performance is observed.

  • PDF

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

  • Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

A Study on Background Speaker Selection Method in Speaker Verification System (화자인증 시스템에서 선정 방법에 관한 연구)

  • Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.135-146
    • /
    • 2002
  • Generally a speaker verification system improves its system recognition ratio by regularizing log likelihood ratio, using a speaker model and its background speaker model that are required to be verified. The speaker-based cohort method is one of the methods that are widely used for selecting background speaker model. Recently, Gaussian-based cohort model has been suggested as a virtually synthesized cohort model, and unlike a speaker-based model, this is the method that chooses only the probability distributions close to basic speaker's probability distribution among the several neighboring speakers' probability distributions and thereby synthesizes a new virtual speaker model. It shows more excellent results than the existing speaker-based method. This study compared the existing speaker-based background speaker models and virtual speaker models and then constructed new virtual background speaker model groups which combined them in a certain ratio. For this, this study constructed a speaker verification system that uses GMM (Gaussin Mixture Model), and found that the suggested method of selecting virtual background speaker model shows more improved performance.

  • PDF

Anomalous Event Detection in Traffic Video Based on Sequential Temporal Patterns of Spatial Interval Events

  • Ashok Kumar, P.M.;Vaidehi, V.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.1
    • /
    • pp.169-189
    • /
    • 2015
  • Detection of anomalous events from video streams is a challenging problem in many video surveillance applications. One such application that has received significant attention from the computer vision community is traffic video surveillance. In this paper, a Lossy Count based Sequential Temporal Pattern mining approach (LC-STP) is proposed for detecting spatio-temporal abnormal events (such as a traffic violation at junction) from sequences of video streams. The proposed approach relies mainly on spatial abstractions of each object, mining frequent temporal patterns in a sequence of video frames to form a regular temporal pattern. In order to detect each object in every frame, the input video is first pre-processed by applying Gaussian Mixture Models. After the detection of foreground objects, the tracking is carried out using block motion estimation by the three-step search method. The primitive events of the object are represented by assigning spatial and temporal symbols corresponding to their location and time information. These primitive events are analyzed to form a temporal pattern in a sequence of video frames, representing temporal relation between various object's primitive events. This is repeated for each window of sequences, and the support for temporal sequence is obtained based on LC-STP to discover regular patterns of normal events. Events deviating from these patterns are identified as anomalies. Unlike the traditional frequent item set mining methods, the proposed method generates maximal frequent patterns without candidate generation. Furthermore, experimental results show that the proposed method performs well and can detect video anomalies in real traffic video data.