Search | Korea Science

SVM Based Speaker Verification Using Sparse Maximum A Posteriori Adaptation

Kim, Younggwan;Roh, Jaeyoung;Kim, Hoirin
- IEIE Transactions on Smart Processing and Computing
- /
- v.2 no.5
- /
- pp.277-281
- /
- 2013
Modern speaker verification systems based on support vector machines (SVMs) use Gaussian mixture model (GMM) supervectors as their input feature vectors, and the maximum a posteriori (MAP) adaptation is a conventional method for generating speaker-dependent GMMs by adapting a universal background model (UBM). MAP adaptation requires the appropriate amount of input utterance due to the number of model parameters to be estimated. On the other hand, with limited utterances, unreliable MAP adaptation can be performed, which causes adaptation noise even though the Bayesian priors used in the MAP adaptation smooth the movements between the UBM and speaker dependent GMMs. This paper proposes a sparse MAP adaptation method, which is known to perform well in the automatic speech recognition area. By introducing sparse MAP adaptation to the GMM-SVM-based speaker verification system, the adaptation noise can be mitigated effectively. The proposed method utilizes the L0 norm as a regularizer to induce sparsity. The experimental results on the TIMIT database showed that the sparse MAP-based GMM-SVM speaker verification system yields a 42.6% relative reduction in the equal error rate with few additional computations.
PDF

L1-norm Regularization for State Vector Adaptation of Subspace Gaussian Mixture Model (L1-norm regularization을 통한 SGMM의 state vector 적응)

Goo, Jahyun;Kim, Younggwan;Kim, Hoirin
- Phonetics and Speech Sciences
- /
- v.7 no.3
- /
- pp.131-138
- /
- 2015
In this paper, we propose L1-norm regularization for state vector adaptation of subspace Gaussian mixture model (SGMM). When you design a speaker adaptation system with GMM-HMM acoustic model, MAP is the most typical technique to be considered. However, in MAP adaptation procedure, large number of parameters should be updated simultaneously. We can adopt sparse adaptation such as L1-norm regularization or sparse MAP to cope with that, but the performance of sparse adaptation is not good as MAP adaptation. However, SGMM does not suffer a lot from sparse adaptation as GMM-HMM because each Gaussian mean vector in SGMM is defined as a weighted sum of basis vectors, which is much robust to the fluctuation of parameters. Since there are only a few adaptation techniques appropriate for SGMM, our proposed method could be powerful especially when the number of adaptation data is limited. Experimental results show that error reduction rate of the proposed method is better than the result of MAP adaptation of SGMM, even with small adaptation data.
https://doi.org/10.13064/KSSS.2015.7.3.131 인용 PDF KSCI

Probabilistic Bilinear Transformation Space-Based Joint Maximum A Posteriori Adaptation

Song, Hwa Jeon;Lee, Yunkeun;Kim, Hyung Soon
- ETRI Journal
- /
- v.34 no.5
- /
- pp.783-786
- /
- 2012
This letter proposes a more advanced joint maximum a posteriori (MAP) adaptation using a prior model based on a probabilistic scheme utilizing the bilinear transformation (BIT) concept. The proposed method not only has scalable parameters but is also based on a single prior distribution without the heuristic parameters of the previous joint BIT-MAP method. Experiment results, irrespective of the amount of adaptation data, show that the proposed method leads to a consistent improvement over the previous method.
https://doi.org/10.4218/etrij.12.0212.0054 인용 PDF KSCI

Self-Adaptation Algorithm Based on Maximum A Posteriori Eigenvoice for Korean Connected Digit Recognition (한국어 연결 숫자음 인식을 일한 최대 사후 Eigenvoice에 근거한 자기적응 기법)

Kim Dong Kook;Jeon Hyung Bae
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.8
- /
- pp.590-596
- /
- 2004
This paper Presents a new self-adaptation algorithm based on maximum a posteriori (MAP) eigenvoice for Korean connected digit recognition. The proposed MAP eigenvoice is developed by introducing a probability density model for the eigenvoice coefficients. The Proposed approach provides a unified framework that incorporates the Prior model into the conventional eigenvoice estimation. In self-adaptation system we use only one adaptation utterance that will be recognized, we use MAP eigenvoice that is most robust adaptation. In series of self-adaptation experiments on the Korean connected digit recognition task. we demonstrate that the performance of the proposed approach is better than that of the conventional eigenvoice algorithm for a small amount of adaptation data.
PDF KSCI

A Noble Decoding Algorithm Using MLLR Adaptation for Speaker Verification (MLLR 화자적응 기법을 이용한 새로운 화자확인 디코딩 알고리듬)

김강열;김지운;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.2
- /
- pp.190-198
- /
- 2002
In general, we have used the Viterbi algorithm of Speech recognition for decoding. But a decoder in speaker verification has to recognize same word of every speaker differently. In this paper, we propose a noble decoding algorithm that could replace the typical Viterbi algorithm for the speaker verification system. We utilize for the proposed algorithm the speaker adaptation algorithms that transform feature vectors into the region of the client' characteristics in the speech recognition. There are many adaptation algorithms, but we take MLLR (Maximum Likelihood Linear Regression) and MAP (Maximum A-Posterior) adaptation algorithms for proposed algorithm. We could achieve improvement of performance about 30% of EER (Equal Error Rate) using proposed algorithm instead of the typical Viterbi algorithm.
PDF KSCI

Automatic Clustering of Speech Data Using Modified MAP Adaptation Technique (수정된 MAP 적응 기법을 이용한 음성 데이터 자동 군집화)

Ban, Sung Min;Kang, Byung Ok;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.6 no.1
- /
- pp.77-83
- /
- 2014
This paper proposes a speaker and environment clustering method in order to overcome the degradation of the speech recognition performance caused by various noise and speaker characteristics. In this paper, instead of using the distance between Gaussian mixture model (GMM) weight vectors as in the Google's approach, the distance between the adapted mean vectors based on the modified maximum a posteriori (MAP) adaptation is used as a distance measure for vector quantization (VQ) clustering. According to our experiments on the simulation data generated by adding noise to clean speech, the proposed clustering method yields error rate reduction of 10.6% compared with baseline speaker-independent (SI) model, which is slightly better performance than the Google's approach.
https://doi.org/10.13064/KSSS.2014.6.1.077 인용 PDF KSCI

On Speaker Adaptations with Sparse Training Data for Improved Speaker Verification

Ahn, Sung-Joo;Kang, Sun-Mee;Ko, Han-Seok
- Speech Sciences
- /
- v.7 no.1
- /
- pp.31-37
- /
- 2000
This paper concerns effective speaker adaptation methods to solve the over-training problem in speaker verification, which frequently occurs when modeling a speaker with sparse training data. While various speaker adaptations have already been applied to speech recognition, these methods have not yet been formally considered in speaker verification. This paper proposes speaker adaptation methods using a combination of MAP and MLLR adaptations, which are successfully used in speech recognition, and applies to speaker verification. Experimental results show that the speaker verification system using a weighted MAP and MLLR adaptation outperforms that of the conventional speaker models without adaptation by a factor of up to 5 times. From these results, we show that the speaker adaptation method achieves significantly better performance even when only small training data is available for speaker verification.
PDF

Korean Speaker Verification Using Speaker Adaptation Methods (화자 적응 기술을 이용한 한국어 화자 확인)

Choi Dong-Jin;Oh Yung-Hwan
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.139-142
- /
- 2006
Speaker verification systems can be implemented using speaker adaptation methods if the amount of speech available for each target speaker is too small to train the speaker model. This paper shows experimental results using well-known adaptation methods, namely Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR). Experimental results using Korean speech show that MLLR is more effective than MAP for short enrollment utterances.
PDF

Noisy Environmental Adaptation for Word Recognition System Using Maximum a Posteriori Estimation (최대사후확률 추정법을 이용한 단어인식기의 잡음환경적응화)

Lee, Jung-Hoon;Lee, Shi-Wook;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.2
- /
- pp.107-113
- /
- 1997
To achive a robust Korean word recognition system for both channel distortion and additive noise, maximum a posteriori estimation(MAP) adaptation is proposed and the effectiveness of environmental adaptation for improving recognition performance is investigated in this paper. To do this, recognition experiments using MAP adaptation are carried out for the three different speech ; 1) channel distortion is introduced, 2) environmental noise is added, 3) both channel distortion and additive noise are presented. Theeffectiveness of additive feature parameters, such as regressive coefficients and durations, for environmental adaptation are also investigated. From the speaker independent 100 words recognition tests, we had 9.0% of recognition improvement for the case 1), more than 75% for the case 2), and 11%~61.4% for the case 3) respectively, resulting that a MAP environmental adaptation is effective for both channel distorted and noise added speech recognition. But it turned out that duration information used as additive feature parameter did not played an important role in the tests.
PDF

Hybrid Speaker Adaptation using Maximum-Likelihood Estimation (MLE를 이용한 하이브리드 화자 적응)

표현아;김세현;오영환
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10d
- /
- pp.268-270
- /
- 2002
최근 음성 인식 시스템의 성능 향상을 위해 화자 적응 (speaker adaptation)에 대한 연구가 활발히 진행되고 있다. HMM 기반 인식 시스템의 모델 파라미터를 수정하는 화자 적응의 경우, MAP방법과 MLLR 방법에 대한 연구가 주류를 이루고 있다. 두 방법은 adaptation data의 양에 따라서 서로 다른 성능을 보인다. 본 논문에서는 기존 두 방법을 Maximum-likelihood Estimation(MLE)를 이용하여 화자 적응을 수행하는 방법을 제안한다. 제안한 방법을 KAIST 통신연구실에서 구축한 한국어 도시이름 500단어 인식 시스템에 적용하여 adaptation data의 양에 상관없이 항상 높은 성능을 나타냈으며, 기존의 방법에 대해서 최고 4.37%의 인식률 향상을 보였다.
PDF

Search Result 102, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)