• Title/Summary/Keyword: gaussian mixture model

Search Result 417, Processing Time 0.03 seconds

An Acoustic Event Detection Method in Tunnels Using Non-negative Tensor Factorization and Hidden Markov Model (비음수 텐서 분해와 은닉 마코프 모델을 이용한 터널 환경에서의 음향 사고 검지 방법)

  • Kim, Nam Kyun;Jeon, Kwang Myung;Kim, Hong Kook
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.9
    • /
    • pp.265-273
    • /
    • 2018
  • In this paper, we propose an acoustic event detection method in tunnels using non-negative tensor factorization (NTF) and hidden Markov model (HMM) applied to multi-channel audio signals. Incidents in tunnel are inherent to the system and occur unavoidably with known probability. Incidents can easily happen minor accidents and extend right through to major disaster. Most incident detection systems deploy visual incident detection (VID) systems that often cause false alarms due to various constraints such as night obstacles and a limit of viewing angle. To this end, the proposed method first tries to separate and detect every acoustic event, which is assumed to be an in-tunnel incident, from noisy acoustic signals by using an NTF technique. Then, maximum likelihood estimation using Gaussian mixture model (GMM)-HMMs is carried out to verify whether or not each detected event is an actual incident. Performance evaluation shows that the proposed method operates in real time and achieves high detection accuracy under simulated tunnel conditions.

A Study on a Model Parameter Compensation Method for Noise-Robust Speech Recognition (잡음환경에서의 음성인식을 위한 모델 파라미터 변환 방식에 관한 연구)

  • Chang, Yuk-Hyeun;Chung, Yong-Joo;Park, Sung-Hyun;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.112-121
    • /
    • 1997
  • In this paper, we study a model parameter compensation method for noise-robust speech recognition. We study model parameter compensation on a sentence by sentence and no other informations are used. Parallel model combination(PMC), well known as a model parameter compensation algorithm, is implemented and used for a reference of performance comparision. We also propose a modified PMC method which tunes model parameter with an association factor that controls average variability of gaussian mixtures and variability of single gaussian mixture per state for more robust modeling. We obtain a re-estimation solution of environmental variables based on the expectation-maximization(EM) algorithm in the cepstral domain. To evaluate the performance of the model compensation methods, we perform experiments on speaker-independent isolated word recognition. Noise sources used are white gaussian and driving car noise. To get corrupted speech we added noise to clean speech at various signal-to-noise ratio(SNR). We use noise mean and variance modeled by 3 frame noise data. Experimental result of the VTS approach is superior to other methods. The scheme of the zero order VTS approach is similar to the modified PMC method in adapting mean vector only. But, the recognition rate of the Zero order VTS approach is higher than PMC and modified PMC method based on log-normal approximation.

  • PDF

Speech Recognition in Noise Environments Using SPLICE with Phonetic Information (음성학적인 정보를 포함한 SPLICE를 이용한 잡음환경에서의 음성인식)

  • Kim Doo Hee;Kim Hyung Soon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.83-86
    • /
    • 2002
  • 훈련과정과 인식과정에서의 주변환경 잡음과 채널 특성 등의 불일치는 음성인식 성능을 급격히 저하시킨다. 이러한 불일치를 보상하기 위해서 켑스트럼 영역에서의 다양한 전처리 방법이 시도되고 있으며 최근에는 stereo 데이터와 잡음 음성의 Gaussian Mixture Model (GMM)을 이용해 보상벡터를 구하는 SPLICE 방법이 좋은 결과를 보이고 있다(1). 기존의 SPLICE가 전체 발성에 대해서 음향학적인 정보만으로 Gaussian 모델을 구하는 반면 본 논문에서는 발성에 해당하는 음소정보를 고려하여 전체 음향 공간을 각 음소에 대해 나누어서 모델링하고 각 음소에 대한 Gaussian 모델과 그 음소에 해당하는 음성데이터만을 이용하여 음소별 보상벡터가 훈련되도록 하였다. 이 경우 보상벡터는 잡음이 각 음소에 미치는 영향을 보다 자세히 나타내게 된다. Aurora 2 데이터베이스를 이용한 실험결과, 제안된 방법이 기존의 SPLICE방법에 비해 성능향상을 보였다.

  • PDF

Unsupervised Change Detection Using Iterative Mixture Density Estimation and Thresholding

  • Park, No-Wook;Chi, Kwang-Hoon
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.402-404
    • /
    • 2003
  • We present two methods for the automatic selection of the threshold values in unsupervised change detection. Both methods consist of the same two procedures: 1) to determine the parameters of Gaussian mixtures from a difference image or ratio image, 2) to determine threshold values using the Bayesian rule for minimum error. In the first method, the Expectation-Maximization algorithm is applied for estimating the parameters of the Gaussian mixtures. The second method is based on the iterative thresholding that successively employs thresholding and estimation of the model parameters. The effectiveness and applicability of the methods proposed here are illustrated by an experiment on the multi-temporal KOMPAT-1 EOC images.

  • PDF

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

  • Chung, Yong-Joo
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.55-65
    • /
    • 2008
  • Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.

  • PDF

Performance Enhancement for Speaker Verification Using Incremental Robust Adaptation in GMM (가무시안 혼합모델에서 점진적 강인적응을 통한 화자확인 성능개선)

  • Kim, Eun-Young;Seo, Chang-Woo;Lim, Yong-Hwan;Jeon, Seong-Chae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.268-272
    • /
    • 2009
  • In this paper, we propose a Gaussian Mixture Model (GMM) based incremental robust adaptation with a forgetting factor for the speaker verification. Speaker recognition system uses a speaker model adaptation method with small amounts of data in order to obtain a good performance. However, a conventional adaptation method has vulnerable to the outlier from the irregular utterance variations and the presence noise, which results in inaccurate speaker model. As time goes by, a rate in which new data are adapted to a model is reduced. The proposed algorithm uses an incremental robust adaptation in order to reduce effect of outlier and use forgetting factor in order to maintain adaptive rate of new data on GMM based speaker model. The incremental robust adaptation uses a method which registers small amount of data in a speaker recognition model and adapts a model to new data to be tested. Experimental results from the data set gathered over seven months show that the proposed algorithm is robust against outliers and maintains adaptive rate of new data.

A Hardware Implementation of EGML-based Moving Object Detection Algorithm (EGML 기반 이동 객체 검출 알고리듬의 하드웨어 구현)

  • Kim, Gyeong-hun;An, Hyo-sik;Shin, Kyung-wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.10
    • /
    • pp.2380-2388
    • /
    • 2015
  • A hardware implementation of MOD(moving object detection) algorithm using EGML(effective Gaussian mixture learning)- based background subtraction to detect moving objects in video is described. Some approximations of EGML calculations are applied to reduce hardware complexity, and pipelining technique is adopted to improve operating speed. The MOD processor designed in Verilog-HDL has been verified by FPGA-in-the-loop verification using MATLAB/Simulink. The MOD processor has 2,218 slices on the Virtex5-XC5VSX95T FPGA device and its throughput is 102 MSamples/s at 102 MHz clock frequency. Evaluation results of the MOD processor for 12 images in the IEEE CDW-2012 dataset show that the average recall value is 0.7631, the average precision value is 0.7778 and the average F-measure value is 0.7535.

Emotion Recognition using Prosodic Feature Vector and Gaussian Mixture Model (운율 특성 벡터와 가우시안 혼합 모델을 이용한 감정인식)

  • Kwak, Hyun-Suk;Kim, Soo-Hyun;Kwak, Yoon-Keun
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2002.11b
    • /
    • pp.762-766
    • /
    • 2002
  • This paper describes the emotion recognition algorithm using HMM(Hidden Markov Model) method. The relation between the mechanic system and the human has just been unilateral so far. This is the why people don't want to get familiar with multi-service robots of today. If the function of the emotion recognition is granted to the robot system, the concept of the mechanic part will be changed a lot. Pitch and Energy extracted from the human speech are good and important factors to classify the each emotion (neutral, happy, sad and angry etc.), which are called prosodic features. HMM is the powerful and effective theory among several methods to construct the statistical model with characteristic vector which is made up with the mixture of prosodic features

  • PDF

Performance Improvement of SPLICE-based Noise Compensation for Robust Speech Recognition (강인한 음성인식을 위한 SPLICE 기반 잡음 보상의 성능향상)

  • Kim, Hyung-Soon;Kim, Doo-Hee
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.263-277
    • /
    • 2003
  • One of major problems in speech recognition is performance degradation due to the mismatch between the training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE), which is frame-based bias removal algorithm for cepstral enhancement using stereo training data and noisy speech model as a mixture of Gaussians, was proposed and showed good performance in noisy environments. In this paper, we propose several methods to improve the conventional SPLICE. First we apply Cepstral Mean Subtraction (CMS) as a preprocessor to SPLICE, instead of applying it as a postprocessor. Secondly, to compensate residual distortion after SPLICE processing, two-stage SPLICE is proposed. Thirdly we employ phonetic information for training SPLICE model. According to experiments on the Aurora 2 database, proposed method outperformed the conventional SPLICE and we achieved a 50% decrease in word error rate over the Aurora baseline system.

  • PDF

Digitally Modulated Signal Classification based on Higher Order Statistics of Cyclostationary Process (순환정상 프로세스의 고차 통계 특성을 이용한 디지털 변조인식)

  • Ahn, Woo-Hyun;Nah, Sun-Phil;Seo, Bo-Seok
    • Journal of Broadcast Engineering
    • /
    • v.19 no.2
    • /
    • pp.195-204
    • /
    • 2014
  • In this paper, we propose an automatic modulation classification method for ten digitally modulated baseband signals, such as 2-FSK, 4-FSK, 8-FSK, MSK, BPSK, QPSK, 8-PSK, 16-QAM, 32-QAM, and 64-QAM based on higher order statistics of cyclostationary process. The first order cyclic moments and higher order cyclic cumulants of the signal are used as features of the modulation signals. The proposed method consists of two stages. At the first stage, we classify modulation signals as M-FSK and non-FSK using peaks of the first order cyclic moment. At the next step, we apply the Gaussian mixture model-based classifier to classify non-FSK. Simulation results are demonstrated to evaluate the proposed scheme. The results show high probability of classification even in the presence of frequency and phase offsets.