Search | Korea Science

Gaussian Selection in HMM Speech Recognizer with PTM Model for Efficient Decoding (PTM 모델을 사용한 HMM 음성인식기에서 효율적인 디코딩을 위한 가우시안 선택기법)

손종목;정성윤;배건성
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.1
- /
- pp.75-81
- /
- 2004
Gaussian selection (GS) is a popular approach in the continuous density hidden Markov model for fast decoding. It enables fast likelihood computation by reducing the number of Gaussian components calculated. In this paper, we propose a new GS method for the phonetic tied-mixture (PTM) hidden Markov models. The PTM model can represent each state of the same topological location with a shared set of Gaussian mixture components and contort dependent weights. Thus the proposed method imposes constraint on the weights as well as the number of Gaussian components to reduce the computational load. Experimental results show that the proposed method reduces the percentage of Gaussian computation to 16.41%, compared with 20-30% for the conventional GS methods, with little degradation in recognition.
PDF KSCI

A Realization of Injurious moving picture filtering system with Gaussian Mixture Model and Frame-level Likelihood Estimation (Gaussian Mixture Model과 프레임 단위 유사도 추정을 이용한 유해동영상 필터링 시스템 구현)

Kim, Min-Joung;Jeong, Jong-Hyeog
- Journal of the Korean Institute of Intelligent Systems
- /
- v.23 no.2
- /
- pp.184-189
- /
- 2013
In this paper, we propose the injurious moving picture filtering system using certain sounds contained in the injurious moving picture to filter injurious moving picture which is distributed without limitation in internet and internet storage space. For this purpose, the Gaussian Mixture Model which can well represent the characteristics of the sound, is used and frame level likelihood estimation is used to calculate the likelihood between filtering target data and the sound models. Also, the pruning method which can real-time proceed by reducing the comparing number of data, is applied for real-time processing, and MWMR method which showed good performance from existing speaker identification, is applied for the distinguish performance of high precision. In the identification experiment result, in case of the frame rate which is the proportion of total frame to high likelihood frame, is set to 50%, identification error rate is 6.06%, and in case of frame rate is set to 60%, error rate is 3.03%. As the result, the proposed system can distinguish between general and injurious moving picture effectively.
https://doi.org/10.5391/JKIIS.2013.23.2.184 인용 PDF KSCI

A Study on the PMC Adaptation for Speech Recognition under Noisy Conditions (잡음 환경에서의 음성인식을 위한 PMC 적응에 관한 연구)

김현기
- Journal of Korea Society of Industrial Information Systems
- /
- v.7 no.3
- /
- pp.9-14
- /
- 2002
In this paper we propose a method for performance enhancement of speech recognizer under noisy conditions. The parallel combination model which is presented at the PMC method using multiple Gaussian-distributed mixtures have been adapted to the variation of each mixture. The CDHMM(continuous observation density HMM) which has multiple Gaussian distributed mixtures are combined by the proposed PMC method. Also, the EM(expectation maximization) algorithm is used for adapting the model mean parameter in order to reduce the variation of the mixture density. The result of simulation, the proposed PMC adaptation method show better performance than the conventional PMC method.
PDF

Emergency Detection Method using Motion History Image for a Video-based Intelligent Security System

Lee, Jun;Lee, Se-Jong;Park, Jeong-Sik;Seo, Yong-Ho
- International journal of advanced smart convergence
- /
- v.1 no.2
- /
- pp.39-42
- /
- 2012
This paper proposed a method that detects emergency situations in a video stream using MHI (Motion History Image) and template matching for a video-based intelligent security system. The proposed method creates a MHI of each human object through image processing technique such as background removing based on GMM (Gaussian Mixture Model), labeling and accumulating the foreground images, then the obtained MHI is compared with the existing MHI templates for detecting an emergency situation. To evaluate the proposed emergency detection method, a set of experiments on the dataset of video clips captured from a security camera has been conducted. And we successfully detected emergency situations using the proposed method. In addition, the implemented system also provides MMS (Multimedia Message Service) so that a security manager can deal with the emergency situation appropriately.
https://doi.org/10.7236/JASC2012.1.2.8 인용 PDF KSCI

A novel Neuro Fuzzy Modeling using Gaussian Mixture Models

Kim, Sung-Suk;Kwak, Keun-Chang;Kim, Sung-Soo;Chun, Myung-Geun;Ryu, Jeong-Woong
- 제어로봇시스템학회:학술대회논문집
- /
- 2002.10a
- /
- pp.110.1-110
- /
- 2002
We propose a novel neuro-fuzzy system based on an efficient clustering method. It is a very useful method that improves the performance of a fuzzy model with small number of fuzzy rules. The fuzzy clustering methods are studied in the wide range of fuzzy modeling. One of them, the grid partition method has problem of exponentially increasing number of rules when the dimension of input or number of membership function is linearly increased. On the other hand, the Expectation Maximization algorithm is an efficient estimation for unknown parameters of the Gaussian mixture model. Here it is noted that the parameters can be used for fuzzy clustering method. In a fuzzy modeling, it is desired that...
PDF

Non-Gaussian time-dependent statistics of wind pressure processes on a roof structure

Huang, M.F.;Huang, Song;Feng, He;Lou, Wenjuan
- Wind and Structures
- /
- v.23 no.4
- /
- pp.275-300
- /
- 2016
Synchronous multi-pressure measurements were carried out with relatively long time duration for a double-layer reticulated shell roof model in the atmospheric boundary layer wind tunnel. Since the long roof is open at two ends for the storage of coal piles, three different testing cases were considered as the empty roof without coal piles (Case A), half coal piles inside (Case B) and full coal piles inside (Case C). Based on the wind tunnel test results, non-Gaussian time-dependent statistics of net wind pressure on the shell roof were quantified in terms of skewness and kurtosis. It was found that the direct statistical estimation of high-order moments and peak factors is quite sensitive to the duration of wind pressure time-history data. The maximum value of COVs (Coefficients of variations) of high-order moments is up to 1.05 for several measured pressure processes. The Mixture distribution models are proposed for better modeling the distribution of a parent pressure process. With the aid of mixture parent distribution models, the existing translated-peak-process (TPP) method has been revised and improved in the estimation of non-Gaussian peak factors. Finally, non-Gaussian peak factors of wind pressure, particularly for those observed hardening pressure process, were calculated by employing various state-of-the-art methods and compared to the direct statistical analysis of the measured long-duration wind pressure data. The estimated non-Gaussian peak factors for a hardening pressure process at the leading edge of the roof were varying from 3.6229, 3.3693 to 3.3416 corresponding to three different cases of A, B and C.
https://doi.org/10.12989/was.2016.23.4.275 인용 KSCI

Gaussian Density Selection Method of CDHMM in Speaker Recognition (화자인식에서 연속밀도 은닉마코프모델의 혼합밀도 결정방법)

서창우;이주헌;임재열;이기용
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.8
- /
- pp.711-716
- /
- 2003
This paper proposes the method to select the number of optimal mixtures in each state in Continuous Density HMM (Hidden Markov Models), Previously, researchers used the same number of mixture components in each state of HMM regardless spectral characteristic of speaker, To model each speaker as accurately as possible, we propose to use a different number of mixture components for each state, Selection of mixture components considered the probability value of mixture by each state that affects much parameter estimation of continuous density HMM, Also, we use PCA (principal component analysis) to reduce the correlation and obtain the system' stability when it is reduced the number of mixture components, We experiment it when the proposed method used average 10% small mixture components than the conventional HMM, When experiment result is only applied selection of mixture components, the proposed method could get the similar performance, When we used principal component analysis, the feature vector of the 16 order could get the performance decrease of average 0,35% and the 25 order performance improvement of average 0.65%.
PDF KSCI

Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model (혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선)

Choi Mu Yeol;Kim Hyung Soon
- MALSORI
- /
- no.52
- /
- pp.133-144
- /
- 2004
The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.
PDF

Detection of Pathological Voice Using Linear Discriminant Analysis

Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
- MALSORI
- /
- no.64
- /
- pp.77-88
- /
- 2007
Nowadays, mel-frequency cesptral coefficients (MFCCs) and Gaussian mixture models (GMMs) are used for the pathological voice detection. This paper suggests a method to improve the performance of the pathological/normal voice classification based on the MFCC-based GMM. We analyze the characteristics of the mel frequency-based filterbank energies using the fisher discriminant ratio (FDR). And the feature vectors through the linear discriminant analysis (LDA) transformation of the filterbank energies (FBE) and the MFCCs are implemented. An accuracy is measured by the GMM classifier. This paper shows that the FBE LDA-based GMM is a sufficiently distinct method for the pathological/normal voice classification, with a 96.6% classification performance rate. The proposed method shows better performance than the MFCC-based GMM with noticeable improvement of 54.05% in terms of error reduction.
PDF

Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter (HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상)

Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
- MALSORI
- /
- no.66
- /
- pp.61-72
- /
- 2008
This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.
PDF

Search Result 302, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)