• Title/Summary/Keyword: speaker adaptation

Search Result 122, Processing Time 0.026 seconds

On Speaker Adaptations with Sparse Training Data for Improved Speaker Verification

  • Ahn, Sung-Joo;Kang, Sun-Mee;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.31-37
    • /
    • 2000
  • This paper concerns effective speaker adaptation methods to solve the over-training problem in speaker verification, which frequently occurs when modeling a speaker with sparse training data. While various speaker adaptations have already been applied to speech recognition, these methods have not yet been formally considered in speaker verification. This paper proposes speaker adaptation methods using a combination of MAP and MLLR adaptations, which are successfully used in speech recognition, and applies to speaker verification. Experimental results show that the speaker verification system using a weighted MAP and MLLR adaptation outperforms that of the conventional speaker models without adaptation by a factor of up to 5 times. From these results, we show that the speaker adaptation method achieves significantly better performance even when only small training data is available for speaker verification.

  • PDF

Speaker Adaptation Using ICA-Based Feature Transformation

  • Jung, Ho-Young;Park, Man-Soo;Kim, Hoi-Rin;Hahn, Min-Soo
    • ETRI Journal
    • /
    • v.24 no.6
    • /
    • pp.469-472
    • /
    • 2002
  • Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression-based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA-based feature transformation matrix, it is necessary to adjust the ICA-based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker-independent (SI) feature transformation matrix and the speaker-dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust.

  • PDF

Emotional Speaker Recognition using Emotional Adaptation (감정 적응을 이용한 감정 화자 인식)

  • Kim, Weon-Goo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.7
    • /
    • pp.1105-1110
    • /
    • 2017
  • Speech with various emotions degrades the performance of the speaker recognition system. In this paper, a speaker recognition method using emotional adaptation has been proposed to improve the performance of speaker recognition system using affective speech. For emotional adaptation, emotional speaker model was generated from speaker model without emotion using a small number of training affective speech and speaker adaptation method. Since it is not easy to obtain a sufficient affective speech for training from a speaker, it is very practical to use a small number of affective speeches in a real situation. The proposed method was evaluated using a Korean database containing four emotions. Experimental results show that the proposed method has better performance than conventional methods in speaker verification and speaker recognition.

Large Scale Voice Dialling using Speaker Adaptation (화자 적응을 이용한 대용량 음성 다이얼링)

  • Kim, Weon-Goo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.4
    • /
    • pp.335-338
    • /
    • 2010
  • A new method that improves the performance of large scale voice dialling system is presented using speaker adaptation. Since SI (Speaker Independent) based speech recognition system with phoneme HMM uses only the phoneme string of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the mismatch between the input utterance and the SI models. A new method that estimates the phonetic string and adaptation vectors iteratively is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the adaptation vectors. The experiments performed over actual telephone line shows that proposed method shows better performance as compared to the conventional method. with the SI phonetic recognizer.

Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis (HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선)

  • Lee, Hea-Min;Kim, Hyung-Soon
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.111-117
    • /
    • 2012
  • In this paper, we compare the performance of several speaker adaptation methods for a HMM-based Korean speech synthesis system with small amounts of adaptation data. According to objective and subjective evaluations, a hybrid method of constrained structural maximum a posteriori linear regression (CSMAPLR) and maximum a posteriori (MAP) adaptation shows better performance than other methods, when only five minutes of adaptation data are available for the target speaker. During the objective evaluation, we find that the duration models are insufficiently adapted to the target speaker as the spectral envelope and pitch models. To alleviate the problem, we propose the duration rectification method and the duration interpolation method. Both the objective and subjective evaluations reveal that the incorporation of the proposed two methods into the conventional speaker adaptation method is effective in improving the performance of the duration model adaptation.

SVM Based Speaker Verification Using Sparse Maximum A Posteriori Adaptation

  • Kim, Younggwan;Roh, Jaeyoung;Kim, Hoirin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.5
    • /
    • pp.277-281
    • /
    • 2013
  • Modern speaker verification systems based on support vector machines (SVMs) use Gaussian mixture model (GMM) supervectors as their input feature vectors, and the maximum a posteriori (MAP) adaptation is a conventional method for generating speaker-dependent GMMs by adapting a universal background model (UBM). MAP adaptation requires the appropriate amount of input utterance due to the number of model parameters to be estimated. On the other hand, with limited utterances, unreliable MAP adaptation can be performed, which causes adaptation noise even though the Bayesian priors used in the MAP adaptation smooth the movements between the UBM and speaker dependent GMMs. This paper proposes a sparse MAP adaptation method, which is known to perform well in the automatic speech recognition area. By introducing sparse MAP adaptation to the GMM-SVM-based speaker verification system, the adaptation noise can be mitigated effectively. The proposed method utilizes the L0 norm as a regularizer to induce sparsity. The experimental results on the TIMIT database showed that the sparse MAP-based GMM-SVM speaker verification system yields a 42.6% relative reduction in the equal error rate with few additional computations.

  • PDF

Adaptation and Clustering Method for Speaker Identification with Small Training Data (화자적응과 군집화를 이용한 화자식별 시스템의 성능 및 속도 향상)

  • Kim Se-Hyun;Oh Yung-Hwan
    • MALSORI
    • /
    • no.58
    • /
    • pp.83-99
    • /
    • 2006
  • One key factor that hinders the widespread deployment of speaker identification technologies is the requirement of long enrollment utterances to guarantee low error rate during identification. To gain user acceptance of speaker identification technologies, adaptation algorithms that can enroll speakers with short utterances are highly essential. To this end, this paper applies MLLR speaker adaptation for speaker enrollment and compares its performance against other speaker modeling techniques: GMMs and HMM. Also, to speed up the computational procedure of identification, we apply speaker clustering method which uses principal component analysis (PCA) and weighted Euclidean distance as distance measurement. Experimental results show that MLLR adapted modeling method is most effective for short enrollment utterances and that the GMMs performs better when long utterances are available.

  • PDF

Korean Speaker Verification Using Speaker Adaptation Methods (화자 적응 기술을 이용한 한국어 화자 확인)

  • Choi Dong-Jin;Oh Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.139-142
    • /
    • 2006
  • Speaker verification systems can be implemented using speaker adaptation methods if the amount of speech available for each target speaker is too small to train the speaker model. This paper shows experimental results using well-known adaptation methods, namely Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR). Experimental results using Korean speech show that MLLR is more effective than MAP for short enrollment utterances.

  • PDF

Speaker Adaptation using ICA-based Feature Transformation (ICA 기반의 특징변환을 이용한 화자적응)

  • Park ManSoo;Kim Hoi-Rin
    • MALSORI
    • /
    • no.43
    • /
    • pp.127-136
    • /
    • 2002
  • The speaker adaptation technique is generally used to reduce the speaker difference in speech recognition. In this work, we focus on the features fitted to a linear regression-based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the transformation matrix is learned from a speaker independent training data. When the amount of data is small, however, it is necessary to adjust the ICA-based transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method: through a linear interpolation between the speaker-independent (SI) feature transformation matrix and the speaker-dependent (SD) feature transformation matrix. We observed that the proposed technique is effective to adaptation performance.

  • PDF

Performance of Vocabulary-Independent Speech Recognizers with Speaker Adaptation

  • Kwon, Oh Wook;Un, Chong Kwan;Kim, Hoi Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1E
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we investigated performance of a vocabulary-independent speech recognizer with speaker adaptation. The vocabulary-independent speech recognizer does not require task-oriented speech databases to estimate HMM parameters, but adapts the parameters recursively by using input speech and recognition results. The recognizer has the advantage that it relieves efforts to record the speech databases and can be easily adapted to a new task and a new speaker with different recognition vocabulary without losing recognition accuracies. Experimental results showed that the vocabulary-independent speech recognizer with supervised offline speaker adaptation reduced 40% of recognition errors when 80 words from the same vocabulary as test data were used as adaptation data. The recognizer with unsupervised online speaker adaptation reduced abut 43% of recognition errors. This performance is comparable to that of a speaker-independent speech recognizer trained by a task-oriented speech database.

  • PDF