• Title/Summary/Keyword: gaussian mixture model

Search Result 417, Processing Time 0.029 seconds

Implementation of the Auditory Sense for the Smart Robot: Speaker/Speech Recognition (로봇 시스템에의 적용을 위한 음성 및 화자인식 알고리즘)

  • Jo, Hyun;Kim, Gyeong-Ho;Park, Young-Jin
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.05a
    • /
    • pp.1074-1079
    • /
    • 2007
  • We will introduce speech/speaker recognition algorithm for the isolated word. In general case of speaker verification, Gaussian Mixture Model (GMM) is used to model the feature vectors of reference speech signals. On the other hand, Dynamic Time Warping (DTW) based template matching technique was proposed for the isolated word recognition in several years ago. We combine these two different concepts in a single method and then implement in a real time speaker/speech recognition system. Using our proposed method, it is guaranteed that a small number of reference speeches (5 or 6 times training) are enough to make reference model to satisfy 90% of recognition performance.

  • PDF

Performance Enhancement of Speaker Identification System Based on GMM Using the Modified EM Algorithm (수정된 EM알고리즘을 이용한 GMM 화자식별 시스템의 성능향상)

  • Kim, Seong-Jong;Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.31-42
    • /
    • 2005
  • Recently, Gaussian Mixture Model (GMM), a special form of CHMM, has been applied to speaker identification and it has proved that performance of GMM is better than CHMM. Therefore, in this paper the speaker models based on GMM and a new GMM using the modified EM algorithm are introduced and evaluated for text-independent speaker identification. Various experiments were performed to evaluate identification performance of two algorithms. As a result of the experiments, the GMM speaker model attained 94.6% identification accuracy using 40 seconds of training data and 32 mixtures and 97.8% accuracy using 80 seconds of training data and 64 mixtures. On the other hand, the new GMM speaker model achieved 95.0% identification accuracy using 40 seconds of training data and 32 mixtures and 98.2% accuracy using 80 seconds of training data and 64 mixtures. It shows that the new GMM speaker identification performance is better than the GMM speaker identification performance.

  • PDF

Analysis of Passenger Movement Patterns Using Subway OD Data (도시철도 출·도착데이터를 이용한 승객이동 패턴 분석)

  • Baik, Euiyoung;Cho, Jae Hee;Kim, Dong-Geon
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.12
    • /
    • pp.315-325
    • /
    • 2019
  • The purpose of this study is to design and construct a data mart that anyone can easily analyze subway OD movement patterns. Subway OD data of the year 2017 was downloaded from the Seoul Open Data Plaza and used as the source data. A multidimensional model was designed, and Gaussian mixed cluster analysis and visualization analysis using Tableau were performed. Interestingly, movement between suburban and Seoul accounts for 23% of the total traffic. The passengers of Suwon Station move to the suburbs much more than Seoul, while Pangyo Station mostly moves to Seoul. As a result of Gaussian mixed cluster, eight clusters of OD segments were found, and the characteristics of each cluster were characterized by segment distance and passenger size.

A study on user defined spoken wake-up word recognition system using deep neural network-hidden Markov model hybrid model (Deep neural network-hidden Markov model 하이브리드 구조의 모델을 사용한 사용자 정의 기동어 인식 시스템에 관한 연구)

  • Yoon, Ki-mu;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.2
    • /
    • pp.131-136
    • /
    • 2020
  • Wake Up Word (WUW) is a short utterance used to convert speech recognizer to recognition mode. The WUW defined by the user who actually use the speech recognizer is called user-defined WUW. In this paper, to recognize user-defined WUW, we construct traditional Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), Linear Discriminant Analysis (LDA)-GMM-HMM and LDA-Deep Neural Network (DNN)-HMM based system and compare their performances. Also, to improve recognition accuracy of the WUW system, a threshold method is applied to each model, which significantly reduces the error rate of the WUW recognition and the rejection failure rate of non-WUW simultaneously. For LDA-DNN-HMM system, when the WUW error rate is 9.84 %, the rejection failure rate of non-WUW is 0.0058 %, which is about 4.82 times lower than the LDA-GMM-HMM system. These results demonstrate that LDA-DNN-HMM model developed in this paper proves to be highly effective for constructing user-defined WUW recognition system.

Estimating Simulation Parameters for Kint Fabrics from Static Drapes (정적 드레이프를 이용한 니트 옷감의 시뮬레이션 파라미터 추정)

  • Ju, Eunjung;Choi, Myung Geol
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.5
    • /
    • pp.15-24
    • /
    • 2020
  • We present a supervised learning method that estimates the simulation parameters required to simulate the fabric from the static drape shape of a given fabric sample. The static drape shape was inspired by Cusick's drape, which is used in the apparel industry to classify fabrics according to their mechanical properties. The input vector of the training model consists of the feature vector extracted from the static drape and the density value of a fabric specimen. The output vector consists of six simulation parameters that have a significant influence on deriving the corresponding drape result. To generate a plausible and unbiased training data set, we first collect simulation parameters for 400 knit fabrics and generate a Gaussian Mixed Model (GMM) generation model from them. Next, a large number of simulation parameters are randomly sampled from the GMM model, and cloth simulation is performed for each sampled simulation parameter to create a virtual static drape. The generated training data is fitted with a log-linear regression model. To evaluate our method, we check the accuracy of the training results with a test data set and compare the visual similarity of the simulated drapes.

Improving A Text Independent Speaker Identification System By Frame Level Likelihood Normalization (프레임단위유사도정규화를 이용한 문맥독립화자식별시스템의 성능 향상)

  • 김민정;석수영;정현열;정호열
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.487-490
    • /
    • 2001
  • 본 논문에서는 기존의 Caussian Mixture Model을 이용한 실시간문맥독립화자인식시스템의 성능을 향상시키기 위하여 화자검증시스템에서 좋은 결과를 나타내는 유사도정규화 ( Likelihood Normalization )방법을 화자식별시스템에 적용하여 시스템을 구현하였으며, 인식실험한 결과에 대해 보고한다. 시스템은 화자모델생성단과 화자식별단으로 구성하였으며, 화자모델생성단에서는, 화자발성의 음향학적 특징을 잘 표현할 수 있는 GMM(Gaussian Mixture Model)을 이용하여 화자모델을 작성하였으며. GMM의 파라미터를 최적화하기 위하여 MLE(Maximum Likelihood Estimation)방법을 사용하였다. 화자식별단에서는 학습된 데이터와 테스트용 데이터로부터 ML(Maximum Likelihood)을 이용하여 프레임단위로 유사도를 계산하였다. 계산된 유사도는 유사도 정규화 과정을 거쳐 스코어( SC)로 표현하였으며, 가장 높은 스코어를 가지는 화자를 인식화자로 결정한다. 화자인식에서 발성의 종류로는 문맥독립 문장을 사용하였다. 인식실험을 위해서는 ETRI445 DB와 KLE452 DB를 사용하였으며. 특징파라미터로서는 켑스트럼계수 및 회귀계수값만을 사용하였다. 인식실험에서는 등록화자의 수를 달리하여 일반적인 화자식별방법과 프레임단위유사도정규화방법으로 각각 인식실험을 하였다. 인식실험결과, 프레임단위유사도정규화방법이 인식화자수가 많아지는 경우에 일반적인 방법보다 향상된 인식률을 얻을수 있었다.

  • PDF

A Study for Video-based Vehicle Surveillance on Outdoor Road (실외 도로에서의 영상기반 차량 감시에 관한 연구)

  • Park, Keun-Soo;Kim, Hyun-Tae
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.11
    • /
    • pp.1647-1654
    • /
    • 2013
  • Detection performance of the vehicle on the road depends on weather conditions, the shadow by the movement of the sun, or illumination changes, etc. In this paper, a vehicle detection system in conjunction with a robust background estimate algorithm to environment change on the road in daytime is proposed. Gaussian Mixture Model is applied as background estimation algorithm, and also, Adaboost algorithm is applied to detect the vehicle for candidate region. Through the experiments with input videos obtained from a various weather conditions at the same actual road, the proposed algorithm were useful to detect vehicles in the road.

Contrast Enhancement based on Gaussian Region Segmentation (가우시안 영역 분리 기반 명암 대비 향상)

  • Shim, Woosung
    • Journal of Broadcast Engineering
    • /
    • v.22 no.5
    • /
    • pp.608-617
    • /
    • 2017
  • Methods of contrast enhancement have problem such as side effect of over-enhancement with non-gaussian histogram distribution, tradeoff enhancement efficiency against brightness preserving. In order to enhance contrast at various histogram distribution, segmentation to region with gaussian distribution and then enhance contrast each region. First, we segment an image into several regions using GMM(Gaussian Mixture Model)fitting by that k-mean clustering and EM(Expectation-Maximization) in $L^*a^*b^*$ color space. As a result region segmentation, we get the region map and probability map. Then we apply local contrast enhancement algorithm that mean shift to minimum overlapping of each region and preserve brightness histogram equalization. Experiment result show that proposed region based contrast enhancement method compare to the conventional method as AMBE(AbsoluteMean Brightness Error) and AE(Average Entropy), brightness is maintained and represented detail information.

Human Motion Tracking by Combining View-based and Model-based Methods for Monocular Video Sequences (하나의 비디오 입력을 위한 모습 기반법과 모델 사용법을 혼용한 사람 동작 추적법)

  • Park, Ji-Hun;Park, Sang-Ho;Aggarwal, J.K.
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.657-664
    • /
    • 2003
  • Reliable tracking of moving humans is essential to motion estimation, video surveillance and human-computer interface. This paper presents a new approach to human motion tracking that combines appearance-based and model-based techniques. Monocular color video is processed at both pixel level and object level. At the pixel level, a Gaussian mixture model is used to train and classily individual pixel colors. At the object level, a 3D human body model projected on a 2D image plane is used to fit the image data. Our method does not use inverse kinematics due to the singularity problem. While many others use stochastic sampling for model-based motion tracking, our method is purely dependent on nonlinear programming. We convert the human motion tracking problem into a nonlinear programming problem. A cost function for parameter optimization is used to estimate the degree of the overlapping between the foreground input image silhouette and a projected 3D model body silhouette. The overlapping is computed using computational geometry by converting a set of pixels from the image domain to a polygon in the real projection plane domain. Our method is used to recognize various human motions. Motion tracking results from video sequences are very encouraging.

EM Algorithm with Initialization Based on Incremental ${\cal}k-means$ for GMM and Its Application to Speaker Identification (GMM을 위한 점진적 ${\cal}k-means$ 알고리즘에 의해 초기값을 갖는 EM알고리즘과 화자식별에의 적용)

  • Seo Changwoo;Hahn Hernsoo;Lee Kiyong;Lee Younjeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.141-149
    • /
    • 2005
  • Tn general. Gaussian mixture model (GMM) is used to estimate the speaker model from the speech for speaker identification. The parameter estimates of the GMM are obtained by using the Expectation-Maximization (EM) algorithm for the maximum likelihood (ML) estimation. However the EM algorithm has such drawbacks that it depends heavily on the initialization and it needs the number of mixtures to be known. In this paper, to solve the above problems of the EM algorithm. we propose an EM algorithm with the initialization based on incremental ${\cal}k-means$ for GMM. The proposed method dynamically increases the number of mixtures one by one until finding the optimum number of mixtures. Whenever adding one mixture, we calculate the mutual relationship between it and one of other mixtures respectively. Finally. based on these mutual relationships. we can estimate the optimal number of mixtures which are statistically independent. The effectiveness of the proposed method is shown by the experiment for artificial data. Also. we performed the speaker identification by applying the proposed method comparing with other approaches.