• Title/Summary/Keyword: Gaussian Mixture Model (GMM)

Search Result 237, Processing Time 0.02 seconds

Speaker Verification Using SVM Kernel with GMM-Supervector Based on the Mahalanobis Distance (Mahalanobis 거리측정 방법 기반의 GMM-Supervector SVM 커널을 이용한 화자인증 방법)

  • Kim, Hyoung-Gook;Shin, Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.216-221
    • /
    • 2010
  • In this paper, we propose speaker verification method using Support Vector Machine (SVM) kernel with Gaussian Mixture Model (GMM)-supervector based on the Mahalanobis distance. The proposed GMM-supervector SVM kernel method is combined GMM with SVM. The GMM-supervectors are generated by GMM parameters of speaker and other speaker utterances. A speaker verification threshold of GMM-supervectors is decided by SVM kernel based on Mahalanobis distance to improve speaker verification accuracy. The experimental results for text-independent speaker verification using 20 speakers demonstrates the performance of the proposed method compared to GMM, SVM, GMM-supervector SVM kernel based on Kullback-Leibler (KL) divergence, and GMM-supervector SVM kernel based on Bhattacharyya distance.

A study on Gaussian mixture model deep neural network hybrid-based feature compensation for robust speech recognition in noisy environments (잡음 환경에 효과적인 음성 인식을 위한 Gaussian mixture model deep neural network 하이브리드 기반의 특징 보상)

  • Yoon, Ki-mu;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.506-511
    • /
    • 2018
  • This paper proposes an GMM(Gaussian Mixture Model)-DNN(Deep Neural Network) hybrid-based feature compensation method for effective speech recognition in noisy environments. In the proposed algorithm, the posterior probability for the conventional GMM-based feature compensation method is calculated using DNN. The experimental results using the Aurora 2.0 framework and database demonstrate that the proposed GMM-DNN hybrid-based feature compensation method shows more effective in Known and Unknown noisy environments compared to the GMM-based method. In particular, the experiments of the Unknown environments show 9.13 % of relative improvement in the average of WER (Word Error Rate) and considerable improvements in lower SNR (Signal to Noise Ratio) conditions such as 0 and 5 dB SNR.

Sound Reinforcement Based on Context Awareness for Hearing Impaired (청각장애인을 위한 상황인지기반의 음향강화기술)

  • Choi, Jae-Hun;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.5
    • /
    • pp.109-114
    • /
    • 2011
  • In this paper, we apply a context awareness based on Gaussian mixture model (GMM) to a sound reinforcement for hearing impaired. In our approach, the harmful sound amplified through the sound reinforcement algorithm according to context awareness based on GMM which is constructed as Mel-frequency cepstral coefficients (MFCC) feature vector from sound data. According to the experimental results, the proposed approach is found to be effective in the various acoustic environments.

Performance of GMM and ANN as a Classifier for Pathological Voice

  • Wang, Jianglin;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.151-162
    • /
    • 2007
  • This study focuses on the classification of pathological voice using GMM (Gaussian Mixture Model) and compares the results to the previous work which was done by ANN (Artificial Neural Network). Speech data from normal people and patients were collected, then diagnosed and classified into two different categories. Six characteristic parameters (Jitter, Shimmer, NHR, SPI, APQ and RAP) were chosen. Then the classification method based on the artificial neural network and Gaussian mixture method was employed to discriminate the data into normal and pathological speech. The GMM method attained 98.4% average correct classification rate with training data and 95.2% average correct classification rate with test data. The different mixture number (3 to 15) of GMM was used in order to obtain an optimal condition for classification. We also compared the average classification rate based on GMM, ANN and HMM. The proper number of mixtures on Gaussian model needs to be investigated in our future work.

  • PDF

Background Subtraction based on GMM for Night-time Video Surveillance (야간 영상 감시를 위한 GMM기반의 배경 차분)

  • Yeo, Jung Yeon;Lee, Guee Sang
    • Smart Media Journal
    • /
    • v.4 no.3
    • /
    • pp.50-55
    • /
    • 2015
  • In this paper, we present background modeling method based on Gaussian mixture model to subtract background for night-time video surveillance. In night-time video, it is hard work to distinguish the object from the background because a background pixel is similar to a object pixel. To solve this problem, we change the pixel of input frame to more advantageous value to make the Gaussian mixture model using scaled histogram stretching in preprocessing step. Using scaled pixel value of input frame, we then exploit GMM to find the ideal background pixelwisely. In case that the pixel of next frame is not included in any Gaussian, the matching test in old GMM method ignores the information of stored background by eliminating the Gaussian distribution with low weight. Therefore we consider the stacked data by applying the difference between the old mean and new pixel intensity to new mean instead of removing the Gaussian with low weight. Some experiments demonstrate that the proposed background modeling method shows the superiority of our algorithm effectively.

Speaker Normalization using Gaussian Mixture Model for Speaker Independent Speech Recognition (화자독립 음성인식을 위한 GMM 기반 화자 정규화)

  • Shin, Ok-Keun
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.437-442
    • /
    • 2005
  • For the purpose of speaker normalization in speaker independent speech recognition systems, experiments are conducted on a method based on Gaussian mixture model(GMM). The method, which is an improvement of the previous study based on vector quantizer, consists of modeling the probability distribution of canonical feature vectors by a GMM with an appropriate number of clusters, and of estimating the warp factor of a test speaker by making use of the obtained probabilistic model. The purpose of this study is twofold: improving the existing ML based methods, and comparing the performance of what is called 'soft decision' method with that of the previous study based on vector quantizer. The effectiveness of the proposed method is investigated by recognition experiments on the TIMIT corpus. The experimental results showed that a little improvement could be obtained tv adjusting the number of clusters in GMM appropriately.

GMM based Speaker Identification using Pitch Information (피치 정보를 이용한 GMM 기반의 화자 식별)

  • Park Taesun;Hahn Minsoo
    • MALSORI
    • /
    • no.47
    • /
    • pp.121-129
    • /
    • 2003
  • This paper describes the use of pitch information for speaker identification. The recognition system is a GMM based one with 4 connected Korean digits speech database. The mean of the pitch period in voiced sections of speech are shown to be ,useful at discriminating between speakers. Utilizing this feature with Gaussian mixture model in the speaker identification system gave a marked improvement, maximum 6% improvement comparing to the baseline Gaussian mixture model.

  • PDF

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Based on GMM (3GPP2 SMV의 실시간 음성/음악 분류 성능 향상을 위한 Gaussian Mixture Model의 적용)

  • Song, Ji-Hyun;Lee, Kye-Hwan;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.390-396
    • /
    • 2007
  • In this letter, we propose a novel approach to improve the performance of speech/music classification for the selectable mode vocoder(SMV) of 3GPP2 using the Gaussian mixture model(GMM) which is based on the expectation-maximization(EM) algorithm. We first present an effective analysis of the features and the classification method adopted in the conventional SMV. And then feature vectors which are applied to the GMM are selected from relevant Parameters of the SMV for the efficient speech/music classification. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional scheme of the SMV.

Feature Selection for Multi-Class Genre Classification using Gaussian Mixture Model (Gaussian Mixture Model을 이용한 다중 범주 분류를 위한 특징벡터 선택 알고리즘)

  • Moon, Sun-Kuk;Choi, Tack-Sung;Park, Young-Cheol;Youn, Dae-Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.10C
    • /
    • pp.965-974
    • /
    • 2007
  • In this paper, we proposed the feature selection algorithm for multi-class genre classification. In our proposed algorithm, we developed GMM separation score based on Gaussian mixture model for measuring separability between two genres. Additionally, we improved feature subset selection algorithm based on sequential forward selection for multi-class genre classification. Instead of setting criterion as entire genre separability measures, we set criterion as worst genre separability measure for each sequential selection step. In order to assess the performance proposed algorithm, we extracted various features which represent characteristics such as timbre, rhythm, pitch and so on. Then, we investigate classification performance by GMM classifier and k-NN classifier for selected features using conventional algorithm and proposed algorithm. Proposed algorithm showed improved performance in classification accuracy up to 10 percent for classification experiments of low dimension feature vector especially.

Frequency Domain Double-Talk Detector Based on Gaussian Mixture Model (주파수 영역에서의 Gaussian Mixture Model 기반의 동시통화 검출 연구)

  • Lee, Kyu-Ho;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.401-407
    • /
    • 2009
  • In this paper, we propose a novel method for the cross-correlation based double-talk detection (DTD), which employing the Gaussian Mixture Model (GMM) in the frequency domain. The proposed algorithm transforms the cross correlation coefficient used in the time domain into 16 channels in the frequency domain using the discrete fourier transform (DFT). The channels are then selected into seven feature vectors for GMM and we identify three different regions such as far-end, double-talk and near-end speech using the likelihood comparison based on those feature vectors. The presented DTD algorithm detects efficiently the double-talk regions without Voice Activity Detector which has been used in conventional cross correlation based double-talk detection. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional schemes. especially, show the robustness against detection errors resulting from the background noises or echo path change which one of the key issues in practical DTD.