통합 검색 | Korea Science

화자 인식을 위한 GMM기반의 이중 보상 구조 (Double Compensation Framework Based on GMM For Speaker Recognition)

김유진;정재호
- 대한음성학회지:말소리
- /
- 제45호
- /
- pp.93-105
- /
- 2003
In this paper, we present a single framework based on GMM for speaker recognition. The proposed framework can simultaneously minimize environmental variations on mismatched conditions and adapt the bias free and speaker-dependent characteristics of claimant utterances to the background GMM to create a speaker model. We compare the closed-set speaker identification for conventional method and the proposed method both on TIMIT and NTIMIT. In the several sets of experiments we show the improved recognition rates on a simulated channel and a telephone channel condition by 7.2% and 27.4% respectively.
PDF

Performance of GMM and ANN as a Classifier for Pathological Voice

Wang, Jianglin;Jo, Cheol-Woo
- 음성과학
- /
- 제14권1호
- /
- pp.151-162
- /
- 2007
This study focuses on the classification of pathological voice using GMM (Gaussian Mixture Model) and compares the results to the previous work which was done by ANN (Artificial Neural Network). Speech data from normal people and patients were collected, then diagnosed and classified into two different categories. Six characteristic parameters (Jitter, Shimmer, NHR, SPI, APQ and RAP) were chosen. Then the classification method based on the artificial neural network and Gaussian mixture method was employed to discriminate the data into normal and pathological speech. The GMM method attained 98.4% average correct classification rate with training data and 95.2% average correct classification rate with test data. The different mixture number (3 to 15) of GMM was used in order to obtain an optimal condition for classification. We also compared the average classification rate based on GMM, ANN and HMM. The proper number of mixtures on Gaussian model needs to be investigated in our future work.
PDF

잡음 환경에 효과적인 음성 인식을 위한 Gaussian mixture model deep neural network 하이브리드 기반의 특징 보상 (A study on Gaussian mixture model deep neural network hybrid-based feature compensation for robust speech recognition in noisy environments)

윤기무;김우일
- 한국음향학회지
- /
- 제37권6호
- /
- pp.506-511
- /
- 2018
본 논문에서는 잡음 환경에서 효과적인 음성인식을 위하여 GMM(Gaussian Mixture Model)-DNN(Deep Neural Network) 하이브리드 기반의 특징 보상 기법을 제안한다. 기존의 GMM 기반의 특징 보상에서 필요로 하는 사후 확률을 DNN을 통해 계산한다. Aurora 2.0 데이터를 이용한 음성 인식 성능 평가에서 본 논문에서 제안한 GMM-DNN 하이브리드 기법이 기존의 GMM 기반 기법에 비해 Known, Unknown 잡음 환경에서 모두 평균적으로 우수한 성능을 나타낸다. 특히 Unknown 잡음 환경에서 평균 오류율이 9.13 %의 상대 향상률을 나타내고, 낮은 SNR(Signal to Noise Ratio) 잡음 환경에서 상당히 우수한 성능을 보인다.
https://doi.org/10.7776/ASK.2018.37.6.506 인용 PDF KSCI HTML

음성신호의 대역폭 확장을 위한 GMM 방법 및 HMM 방법의 성능평가 (Performance Comparison of GMM and HMM Approaches for Bandwidth Extension of Speech Signals)

송근배;김석호
- 한국음향학회지
- /
- 제27권3호
- /
- pp.119-128
- /
- 2008
본 논문에서는 대역폭 확장 (Bandwidth Extension, BWE)을 위한 대표적인 통계적 방법인 가우스 혼합 모델 (Gaussian Mixture Model, GMM) 방법과 은닉마코프 모델 (Hidden Markov Model, HMM) 방법의 관계를 분석하고 성능을 비교한다. HMM 방법은 GMM 방법과 달리 기억능력을 가진 시스템으로서 인접한 음성 프레임간의 상관성을 모델링하고 이를 BWE 시스템에 활용한다는 장점을 가진다. 따라서 원래 신호의 프레임간 스펙트럼 변화특성을 보다 잘 추정할 수 있으리라 예상할 수 있다. 이 점을 확인하기 위해 정적 측도 외에 음성 스펙트럼의 일차 도 함수와 관련된 동적 측도를 적용하였다. 성능평가 결과, 정적 측도 관점에서는 두 방법은 대등한 성능을 보였지만 동적 측도 관점에서는 HMM 방법이 우수한 성능을 보였다. 또한 이러한 차이는 HMM 모델의 상태 수에 비례하여 증가함을 확인할 수 있었다. 이와 같은 실험결과는 HMM 방법이 적어도 'blind BWE' 문제에 있어서 적절한 해법임을 시사한다. 한편, 동적 측도의 관점에서는 비록 열세로 나타났지만 GMM 방법은 상대적으로 단순하다는 장점을 가지고 있으며 특히, 정적 측도에 있어서 HMM 방법과 대등하다는 사실은 응용분야에 따라서는 HMM 방법의 효과적인 대안이 될 수 있음을 시사한다.
https://doi.org/10.7776/ASK.2008.27.3.119 인용 PDF KSCI

미전사 음성 데이터베이스를 이용한 가우시안 혼합 모델 적응 기반의 음성 인식용 음향 모델 변환 기법 (Acoustic Model Transformation Method for Speech Recognition Employing Gaussian Mixture Model Adaptation Using Untranscribed Speech Database)

김우일
- 한국정보통신학회논문지
- /
- 제19권5호
- /
- pp.1047-1054
- /
- 2015
본 논문에서는 음성 인식 성능 향상을 위해 미전사된 음성 데이터베이스를 이용한 효과적인 음향 모델 변환 기법을 기술한다. 본 논문에서 기술하는 모델 변환 기법에서는 기존의 적응 기법을 이용하여 환경에 적응된 GMM을 얻는다. HMM의 가우시안 요소와 유사한 요소를 선택하여 선택된 가우시안 요소의 변환 벡터를 구하고 이를 평균 파라미터 변환에 이용한다. GMM 적응 기반의 모델 변환 기법을 기존의 MAP, MLLR 적응 기법과 결합하여 적용한 결과, 자동차 잡음과 음성 Babble 잡음 환경에서 기존의 MAP, MLLR을 단독으로 사용할 경우보다 높은 음성 인식성능을 나타낸다. 온라인 음향 모델 적응 실험에서도 MLLR과 결합할 경우 기존의 MLLR을 단독으로 사용할 때보다 효과적인 모델 적응 성능을 나타낸다. 이와 같은 결과는 본 논문에서 소개한 GMM 적응 기반의 모델 변환 기법을 채용함으로써 미전사된 음성 데이터베이스를 음향 모델 적응 기법에 효과적으로 활용할 수 있음을 입증한다.
https://doi.org/10.6109/jkiice.2015.19.5.1047 인용 PDF KSCI KPUBS HTML

GMM을 이용한 자본자산가격결정모형(資本資産價格決定模型)의 추정(推定)

이주희;남주하
- 재무관리연구
- /
- 제9권2호
- /
- pp.57-75
- /
- 1992
본 논문은 10개의 기업규모별 자산을 대상으로 최근에 발전된 계량기법인 GMM(generalized method of moments)을 이용하여 베타(beta)를 추정하였다. 분석대상기간으로 $1982.1{\sim}1991.4$사이의 월별자료를 사용한다. 실증분석 결과에 의하면, 기업규모별 구분에 따른 자산의 경우에 규모가 큰 기업보다 규모가 작은 기업의 베타가 상대적으로 작은 것으로 나타났다. GMM의 추정을 위한 수단변수로 회사채수익률과 정기예금금리의 금리차, 분석대상이 되는 자산 수익률과 시장포트폴리오의 자기시차, 그리고 상수가 사용되었다. OLS를 사용한 CAPM추정 결과에 비해 GMM을 사용한 추정 결과가 우월할 수 있음을 보여주고 있는데, 이것은 GMM에 사용된 수단변수들이 수단변수를 포함시킴으로써 관련자산들의 자기시차가 아닌 CAPM추정에 필요한 유용한 대용변수(代用變數)(proxy)를 제공하였고, 나아가 GMM이 잔차항(殘差項)의 자기상관(自己相關) 뿐만 아니라 조건부(條件附) 이분산(異分散)(conditional heteroskedasticity)을 잘 설명하고 있기 때문인 것으로 판단된다. t값 및 P-value에 의하면 GMM을 사용한 단순 CAPM 추정이 우리 나라의 현실경제와 잘 부합될 수 있음을 암시한다.
PDF

Minimum Classification Error 방법 도입을 통한 Gaussian Mixture Model 환경음 인식성능 향상 (Gaussian Mixture Model using Minimum Classification Error for Environmental Sounds Recognition Performance Improvement)

한다정;박아론;박준규;백성준
- 한국콘텐츠학회논문지
- /
- 제11권12호
- /
- pp.497-503
- /
- 2011
본 연구에서는 환경음 인식 성능의 향상을 위하여 GMM의 훈련 방식에 MCE 도입을 제안하였다. 이는 환경음 데이터 모델링에 사용할 분류오류함수를 정의할 때 해당 클래스의 로그우도 뿐 아니라 다른 클래스의 로그우도도 같이 고려함으로써 변별력 있는 분류가 이뤄질 수 있게 한다. 모델의 파라미터는 전체 클래스를 고려한 손실함수를 정의하고, GPD(generalized probabilistic descent)알고리즘을 이용하여 추정하였다. 제안된 방법의 인식 성능 비교를 위해 모두 9가지 환경음을 전처리 과정과 MFCC(mel-frequency cepstral coefficients)를 이용하여 12차 특징을 추출하고, 이를 혼합 성분의 수에 따라 GMM 분류 실험을 행하였다. 실험 결과에 따르면 혼합 성분을 19개 사용한 경우에서 MCE 훈련 방식이 평균 87.06%의 인식률로 가장 좋은 성능을 보였다. 이 결과로 제안한 MCE 훈련 방식이 환경음 인식에서 GMM의 훈련 방식으로 효과적으로 사용될 수 있음을 확인하였다.
https://doi.org/10.5392/JKCA.2011.11.12.497 인용 PDF KSCI

Tracking and Face Recognition of Multiple People Based on GMM, LKT and PCA

Lee, Won-Oh;Park, Young-Ho;Lee, Eui-Chul;Lee, Hee-Kyung;Park, Kang-Ryoung
- 한국멀티미디어학회논문지
- /
- 제15권4호
- /
- pp.449-471
- /
- 2012
In intelligent surveillance systems, it is required to robustly track multiple people. Most of the previous studies adopted a Gaussian mixture model (GMM) for discriminating the object from the background. However, it has a weakness that its performance is affected by illumination variations and shadow regions can be merged with the object. And when two foreground objects overlap, the GMM method cannot correctly discriminate the occluded regions. To overcome these problems, we propose a new method of tracking and identifying multiple people. The proposed research is novel in the following three ways compared to previous research: First, the illuminative variations and shadow regions are reduced by an illumination normalization based on the median and inverse filtering of the L*a*b* image. Second, the multiple occluded and overlapped people are tracked by combining the GMM in the still image and the Lucas-Kanade-Tomasi (LKT) method in successive images. Third, with the proposed human tracking and the existing face detection & recognition methods, the tracked multiple people are successfully identified. The experimental results show that the proposed method could track and recognize multiple people with accuracy.
https://doi.org/10.9717/kmms.2012.15.4.449 인용 PDF KSCI

화자 인식을 통한 등장인물 기반의 비디오 요약 (Character-Based Video Summarization Using Speaker Identification)

이순탁;김종성;강찬미;백중환
- 융합신호처리학회논문지
- /
- 제6권4호
- /
- pp.163-168
- /
- 2005
본 논문에서는 인물 기반의 비디오 요약 방법으로써 비디오 내 음성정보를 이용하여 화자 인식 기법을 통한 등장인물 중심의 요약 기법을 제안한다. 먼저, 얼굴 영역을 포함하는 장면을 중심으로 비디오로부터 배우의 대사에 해당하는 음성 정보를 분리하고, 화자 인식 기법을 수행하여 등장인물 별로 분류하였다. 화자인식 기법은 각 화자별로 MFCC(Mel Frequency Cepstrum Coefficient) 값을 추출하고 GMM(Gaussian Mixture Model)을 이용하여 분류한다. 본 논문에서는 4명의 등장인물에 대해 GMM을 학습시키고 4명 중 1명을 검출하는 실험을 통해 학습된 GMM 분류기가 실험 비디오에 대해 0.138 정도의 오분류율을 보임을 확인하였다.
PDF

Detection of Pathological Voice Using Linear Discriminant Analysis

Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
- 대한음성학회지:말소리
- /
- 제64호
- /
- pp.77-88
- /
- 2007
Nowadays, mel-frequency cesptral coefficients (MFCCs) and Gaussian mixture models (GMMs) are used for the pathological voice detection. This paper suggests a method to improve the performance of the pathological/normal voice classification based on the MFCC-based GMM. We analyze the characteristics of the mel frequency-based filterbank energies using the fisher discriminant ratio (FDR). And the feature vectors through the linear discriminant analysis (LDA) transformation of the filterbank energies (FBE) and the MFCCs are implemented. An accuracy is measured by the GMM classifier. This paper shows that the FBE LDA-based GMM is a sufficiently distinct method for the pathological/normal voice classification, with a 96.6% classification performance rate. The proposed method shows better performance than the MFCC-based GMM with noticeable improvement of 54.05% in terms of error reduction.
PDF

검색결과 298건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)