• Title/Summary/Keyword: Speaker Adaptation

Search Result 122, Processing Time 0.027 seconds

Probabilistic Bilinear Transformation Space-Based Joint Maximum A Posteriori Adaptation

  • Song, Hwa Jeon;Lee, Yunkeun;Kim, Hyung Soon
    • ETRI Journal
    • /
    • v.34 no.5
    • /
    • pp.783-786
    • /
    • 2012
  • This letter proposes a more advanced joint maximum a posteriori (MAP) adaptation using a prior model based on a probabilistic scheme utilizing the bilinear transformation (BIT) concept. The proposed method not only has scalable parameters but is also based on a single prior distribution without the heuristic parameters of the previous joint BIT-MAP method. Experiment results, irrespective of the amount of adaptation data, show that the proposed method leads to a consistent improvement over the previous method.

Noise Robust Speaker Verification Using Subband-Based Reliable Feature Selection (신뢰성 높은 서브밴드 특징벡터 선택을 이용한 잡음에 강인한 화자검증)

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • MALSORI
    • /
    • no.63
    • /
    • pp.125-137
    • /
    • 2007
  • Recently, many techniques have been proposed to improve the noise robustness for speaker verification. In this paper, we consider the feature recombination technique in multi-band approach. In the conventional feature recombination for speaker verification, to compute the likelihoods of speaker models or universal background model, whole feature components are used. This computation method is not effective in a view point of multi-band approach. To deal with non-effectiveness of the conventional feature recombination technique, we introduce a subband likelihood computation, and propose a modified feature recombination using subband likelihoods. In decision step of speaker verification system in noise environments, a few very low likelihood scores of a speaker model or universal background model cause speaker verification system to make wrong decision. To overcome this problem, a reliable feature selection method is proposed. The low likelihood scores of unreliable feature are substituted by likelihood scores of the adaptive noise model. In here, this adaptive noise model is estimated by maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. The proposed method using subband-based reliable feature selection obtains better performance than conventional feature recombination system. The error reduction rate is more than 31 % compared with the feature recombination-based speaker verification system.

  • PDF

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

  • Kim Dong-Hyun;Hong Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.4 no.3
    • /
    • pp.9-14
    • /
    • 2003
  • The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.

  • PDF

A Study on Speaker Adaptation of Large Continuous Spoken Language Using back-off bigram (Back-off bigram을 이랑한 대용량 연속어의 화자적응에 관한 연구)

  • 최학윤
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.9C
    • /
    • pp.884-890
    • /
    • 2003
  • In this paper, we studied the speaker adaptation methods that improve the speaker independent recognition system. For the independent speakers, we compared the results between bigram and back-off bigram, MAP and MLLR. Cause back-off bigram applys unigram and back-off weighted value as bigram probability value, it has the effect adding little weighted value to bigram probability value. We did an experiment using total 39-feature vectors as featuring voice parameter with 12-MFCC, log energy and their delta and delta-delta parameter. For this recognition experiment, We constructed a system made by CHMM and tri-phones recognition unit and bigram and back-off bigrams language model.

Fluidic velocity sensing with a speaker based optical doppler tomography (유속 센싱을 위한 스피커형 광학적 유체 단층촬영 기술)

  • Lee, Chang-Ho;Kim, Jee-Hyun
    • Journal of Sensor Science and Technology
    • /
    • v.17 no.4
    • /
    • pp.317-324
    • /
    • 2008
  • This paper presents an optical doppler tomography(ODT) system using a speaker as a method to achieve depth measurement in a flowing sample. The use of the speaker provides easy implementation with a low cost. The nonlinear characteristics of the speaker has hindered its adaptation because it produces inconsistent fringe frequencies at different depths. This paper reports an adaptive algorithm to compensate the nonlinear characteristics, and could, resultantly, acquire the Doppler frequency shift caused by the sample. The experiment utilizes a flowing scattering particle solution in a capillary tube at a certain flow rate. The Doppler frequency profile over the lumen was calculated by using spectrogram method. and we obtained the velocity image of the sample.

Short utterance speaker verification using PLDA model adaptation and data augmentation (PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증)

  • Yoon, Sung-Wook;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.85-94
    • /
    • 2017
  • Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.

A Study for Effective Speaker Adaptation and a priori Threshold Updating in Speaker Verification (화자 인증에서의 효과적인 화자 적응과 a priori Threshold Updating에 관한 연구)

  • 조영훈;이수호;홍대희;고한석
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.491-494
    • /
    • 2001
  • 실제 화자 인증기를 설계함에 있어서 발생하는 가장큰 문제는, 적은 Enrollment data로 화자 모델이 만들어 지므로 화자 인증기의 성능이 시간이 지남에 따라 굉장히 줄어들게 되는 것과, 미리 훈련된 데이터 만으로 Threshold를 설정함에 따라 차후 실제 사용 시에 발생하는 변이를 고려하지 못하여 역시 성능 저하의 문제를 발생시킨다는 것이다. 위의 문제를 해결하기 위해 이 논문은 화자 모델을 구성하는데 있어 MAP 방법을 적용하고, threshold를 Resetting하는 방법을 적용했다. 본 논문에서 제안한 방법으로 HTER값이 23%정도 줄어듦을 보여준다.

  • PDF

Selective Attentive Learning for Fast Speaker Adaptation in Multilayer Perceptron (다층 퍼셉트론에서의 빠른 화자 적응을 위한 선택적 주의 학습)

  • 김인철;진성일
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.48-53
    • /
    • 2001
  • In this paper, selectively attentive learning method has been proposed to improve the learning speed of multilayer Perceptron based on the error backpropagation algorithm. Three attention criterions are introduced to effectively determine which set of input patterns is or which portion of network is attended to for effective learning. Such criterions are based on the mean square error function of the output layer and class-selective relevance of the hidden nodes. The acceleration of learning time is achieved by lowering the computational cost per iteration. Effectiveness of the proposed method is demonstrated in a speaker adaptation task of isolated word recognition system. The experimental results show that the proposed selective attention technique can reduce the learning time more than 60% in an average sense.

  • PDF

Sequential Adaptation Algorithm Based on Transformation Space Model for Speech Recognition (음성인식을 위한 변환 공간 모델에 근거한 순차 적응기법)

  • Kim, Dong-Kook;Chang, Joo-Hyuk;Kim, Nam-Soo
    • Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.75-88
    • /
    • 2004
  • In this paper, we propose a new approach to sequential linear regression adaptation of continuous density hidden Markov models (CDHMMs) based on transformation space model (TSM). The proposed TSM which characterizes the a priori knowledge of the training speakers associated with maximum likelihood linear regression (MLLR) matrix parameters is effectively described in terms of the latent variable models. The TSM provides various sources of information such as the correlation information, the prior distribution, and the prior knowledge of the regression parameters that are very useful for rapid adaptation. The quasi-Bayes (QB) estimation algorithm is formulated to incrementally update the hyperparameters of the TSM and regression matrices simultaneously. Experimental results showed that the proposed TSM approach is better than that of the conventional quasi-Bayes linear regression (QBLR) algorithm for a small amount of adaptation data.

  • PDF

Phonetic Transcription based Speech Recognition using Stochastic Matching Method (확률적 매칭 방법을 사용한 음소열 기반 음성 인식)

  • Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.5
    • /
    • pp.696-700
    • /
    • 2007
  • A new method that improves the performance of the phonetic transcription based speech recognition system is presented with the speaker-independent phonetic recognizer. Since SI phoneme HMM based speech recognition system uses only the phoneme transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the phoneme recognition errors generated from using SI models. A new training method that iteratively estimates the phonetic transcription and transformation vectors is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the transformation vectors. The experiments performed over actual telephone line shows that a reduction of about 45% in the error rates could be achieved as compared to the conventional method.