• Title/Summary/Keyword: Rapid speaker adaptation

Search Result 10, Processing Time 0.02 seconds

Rapid Speaker Adaptation Based on Eigenvoice Using Weight Distribution Characteristics (가중치 분포 특성을 이용한 Eigenvoice 기반 고속화자적응)

  • 박종세;김형순;송화전
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.403-407
    • /
    • 2003
  • Recently, eigenvoice approach has been widely used for rapid speaker adaptation. However, even in the eigenvoice approach, Performance improvement using very small amount of adaptation data is relatively small in comparison with that using somewhat large adaptation data because the reliable estimation of weights of eigenvoice is difficult. In this paper, we propose a rapid speaker adaptation method based on eigenvoice using the weight distribution characteristics to improve the performance on a small adaptation data. In the Experimental results on vocabulary-independent word recognition task (using PBW 452 database), the weight threshold method alleviates the problem of relatively low performance for a tiny small adaptation data. When single adaptation word is used, word error rate is reduced about 9-18% by the weight threshold method.

Rapid Speaker Adaptation for Continuous Speech Recognition Using Merging Eigenvoices (Eigenvoice 병합을 이용한 연속 음성 인식 시스템의 고속 화자 적응)

  • Choi, Dong-Jin;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.53
    • /
    • pp.143-156
    • /
    • 2005
  • Speaker adaptation in eigenvoice space is a popular method for rapid speaker adaptation. To improve the performance of the method, the number of speaker dependent models should be increased and eigenvoices should be re-estimated. However, principal component analysis takes much time to find eigenvoices, especially in a continuous speech recognition system. This paper describes a method to reduce computation time to estimate eigenvoices only for supplementary speaker dependent models and to merge them with the used eigenvoices. Experiment results show that the computation time is reduced by 73.7% while the performance is almost the same in case that the number of speaker dependent models is the same as used ones.

  • PDF

Performance Improvement of Rapid Speaker Adaptation Using Bias Compensation and Mean of Dimensional Eigenvoice Models (바이어스 보상과 차원별 Eigenvoice 모델 평균을 이용한 고속화자적응의 성능향상)

  • 박종세;김형순;송화전
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.383-389
    • /
    • 2004
  • In this paper. we propose the bias compensation methods and the eigenvoice method using the mean of dimensional eigenvoice to improve the performance of rapid speaker adaptation based on eigenvoice under mismatch between training and test environment. Experimental results for vocabulary-independent word recognition task (using PBW 452 DB) show that the proposed methods yield improvements for small adaptation data. We obtained about 22∼30% relative improvement by the bias compensation methods as amount of adaptation data varied from 1 to 50, and obtained 41% relative improvement in error rate by the eigenvoice method using the mean of dimensional eigenvoice with only single adaptation word.

Rapid Speaker Adaptation Based on MAPLR with Adaptive Hybrid Priors Estimated from Reference Speakers (참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응)

  • Song, Young-Rok;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.315-323
    • /
    • 2011
  • This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.

Isolated Word Recognition Using a Speaker-Adaptive Neural Network (화자적응 신경망을 이용한 고립단어 인식)

  • 이기희;임인칠
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.5
    • /
    • pp.765-776
    • /
    • 1995
  • This paper describes a speaker adaptation method to improve the recognition performance of MLP(multiLayer Perceptron) based HMM(Hidden Markov Model) speech recognizer. In this method, we use lst-order linear transformation network to fit data of a new speaker to the MLP. Transformation parameters are adjusted by back-propagating classification error to the transformation network while leaving the MLP classifier fixed. The recognition system is based on semicontinuous HMM's which use the MLP as a fuzzy vector quantizer. The experimental results show that rapid speaker adaptation resulting in high recognition performance can be accomplished by this method. Namely, for supervised adaptation, the error rate is signifecantly reduced from 9.2% for the baseline system to 5.6% after speaker adaptation. And for unsupervised adaptation, the error rate is reduced to 5.1%, without any information from new speakers.

  • PDF

ImprovementofMLLRAlgorithmforRapidSpeakerAdaptationandReductionofComputation (빠른 화자 적응과 연산량 감소를 위한 MLLR알고리즘 개선)

  • Kim, Ji-Un;Chung, Jae-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.65-71
    • /
    • 2004
  • We improved the MLLR speaker adaptation algorithm with reduction of the order of HMM parameters using PCA(Principle Component Analysis) or ICA(Independent Component Analysis). To find a smaller set of variables with less redundancy, we adapt PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible, minimize the correlations between data elements, and remove the axis with less covariance or higher-order statistical independencies. Ordinary MLLR algorithm needs more than 30 seconds adaptation data to represent higher word recognition rate of SD(Speaker Dependent) models than of SI(Speaker Independent) models, whereas proposed algorithm needs just more than 10 seconds adaptation data. 10 components for ICA and PCA represent similar performance with 36 components for ordinary MLLR framework. So, compared with ordinary MLLR algorithm, the amount of total computation requested in speaker adaptation is reduced by about 1/167 in proposed MLLR algorithm.

Sequential Adaptation Algorithm Based on Transformation Space Model for Speech Recognition (음성인식을 위한 변환 공간 모델에 근거한 순차 적응기법)

  • Kim, Dong-Kook;Chang, Joo-Hyuk;Kim, Nam-Soo
    • Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.75-88
    • /
    • 2004
  • In this paper, we propose a new approach to sequential linear regression adaptation of continuous density hidden Markov models (CDHMMs) based on transformation space model (TSM). The proposed TSM which characterizes the a priori knowledge of the training speakers associated with maximum likelihood linear regression (MLLR) matrix parameters is effectively described in terms of the latent variable models. The TSM provides various sources of information such as the correlation information, the prior distribution, and the prior knowledge of the regression parameters that are very useful for rapid adaptation. The quasi-Bayes (QB) estimation algorithm is formulated to incrementally update the hyperparameters of the TSM and regression matrices simultaneously. Experimental results showed that the proposed TSM approach is better than that of the conventional quasi-Bayes linear regression (QBLR) algorithm for a small amount of adaptation data.

  • PDF

Robust Correlation Estimation for Rapid Speaker Adaptation (EMAP에 기반한 화자적응을 위한 강인한 상관계수의 예측)

  • 전유진;김동국;김남수
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.113-116
    • /
    • 2000
  • 본 논문에서는 probabilistic principal component analysis (PPCA)를 이용하여 extended maximum a posteriori (EMAP)에 기반한 화자적응 시스템의 성능을 향상시키는 방법을 제시하고자 한다. PPCA는 각각의 hidden Markov model (HMM) 사이의 상관계수 행렬을 강인하게 예측하는데 적용된다. 이렇게 구한 상관계수 행렬은 화자적응 시스템에 사용된다. PPCA는 연산이 효율적이고, EMAP에서 기존에 사용되었던 방법에 비해 향상된 성능을 보여준다. 여러 차례의 음성인식 실험을 통하여, PPCA를 적용한 EMAP은 적은 양의 적응 데이타에서 좋은 성능을 보인다는 것을 확인할 수 있다.

  • PDF

A Closed-Form Solution of Linear Spectral Transformation for Robust Speech Recognition

  • Kim, Dong-Hyun;Yook, Dong-Suk
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.454-456
    • /
    • 2009
  • The maximum likelihood linear spectral transformation (ML-LST) using a numerical iteration method has been previously proposed for robust speech recognition. The numerical iteration method is not appropriate for real-time applications due to its computational complexity. In order to reduce the computational cost, the objective function of the ML-LST is approximated and a closed-form solution is proposed in this paper. It is shown experimentally that the proposed closed-form solution for the ML-LST can provide rapid speaker and environment adaptation for robust speech recognition.

Efficient Rapid Speaker Adaptation Using Merging Eigenvoices (Eigenvoice 병합을 이용한 효율적인 고속 화자 적응)

  • Choi Dong-jin;Oh Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.115-118
    • /
    • 2004
  • 음성 인식 분야에서는 화자 적응을 통해 화자 독립 시스템의 성능을 화자 종속 시스템에 근접시키려는 여러 가지 노력이 시도되고 있다. 특히 30 초미만의 매우 적은 양의 적응 자료를 이용하는 고속 화자 적응에 대한 관심이 증가하고 있다. 고속 화자 적응에 적합한 eigenvoice 를 이용한 적응 방법은 eigenvoice 를 구성하기 위해 너무 많은 계산량과 메모리를 요구한다. 본 논문에서는 각각 따로 계산된 eigenvoice 들을 한 번에 구성한 eigenvoice 들과 거의 같은 정확도를 갖도록 병합하여 고속 화자 적응에 이용하는 방법을 제안한다. 이 방법을 이용하면 훈련 자료의 추가시 처음부터 새롭게 eigenvoice 를 구하는 대신 추가된 자료에 대한 eigenvoice 를 구하고 병합함으로써 계산량과 메모리양을 현저히 줄일 수 있다. 실험 결과, 메모리와 계산량은 추가되는 화자 종속 모델의 수에 따라 감소하며 성능 저하는 거의 없었다.

  • PDF