Quantization Based Speaker Normalization for DHMM Speech Recognition System

DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화

  • 신옥근 (한국해양대학교 자동차정보공학부)
  • Published : 2003.05.01

Abstract

There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.

화자독립 음성인식기에서 화자사이의 성도 길이의 영향을 최소화시켜 인식 성능을 개선하는 화자 정규화에 대한 많은 연구가 있어 왔다. 본 연구에서는 벡터양자화기를 이용하여 화자 검증이 가능하다는 사실에 착안하여 벡터 양자화기를 이용한 비교적 간단한 선형 워핑 화자정규화방법을 제안한다. 제안하는 방법에서는 먼저 정규화에 이용될 최적의 코드북을 생성한 다음, 이 코드 북을 이용하여 화자의 선형 워핑계수를 추출하고 추출된 워핑계수는 멜 켑스트럼 추출시에 사용되는 멜스케일 필터뱅크를 워핑하기 위해 이용된다. 본고에서 제안한 워핑계수 추출 및 적용 방법의 성능을 확인하기 위해 이산 HMM을 이용한 13가지의 단음절 한글 숫자음 인식기를 이용하여 인식실험을 수행하였으며, 실험 결과 약 29%의 오인식률 감소를 보여 제안하는 화자 정규화방법이 다른 라인서치 워핑계수추출 방법보다 간단한 동시에 효용가치가 있음을 확인하였다.

Keywords

References

  1. Proc. of the ICASSP v.1 A study on speaker adaptation of continuous density HMM parameters C.H.Lee;C.H.Lin;B.H.Juang
  2. Computer Speech and Language v.9 Maximum likelihood linear regression for speaker adaptation of continuos density hidden markov models C.Leggetter;P.Woodland https://doi.org/10.1006/csla.1995.0010
  3. MLLR: A speaker adaptation technique for LVCSR, Lecture note J.E.Hamaker
  4. Thesis, Carneigie Mellon University Acoustic-Feature-based Frequency Warping For Speaker Normalization E.B.Gouvea
  5. Language Technologies Institute Technical Report: CMU-LTI-97-150, Carnegie Melon University Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition P.Zhan;Alex Waibel
  6. IEEE Trans. on Speech and Audio Processing v.6 no.1 A frequency warping approach to speaker normalization L.Lee;R.C.Rose https://doi.org/10.1109/89.650310
  7. EuroSpeech v.1 Speaker normalization using constrained spectral shifts in auditory filter domain Y.Ono;H.Wakita;Y.Zhao
  8. Proc. Trends and Recent Achievements in Information Technology Speaker verification with vector quantisation P.G.Pop;E.Lupu
  9. SPC Comparison of clustering algorithms in speaker identification T.Kinnunen;T.Kilpelainen;P.Franti
  10. Pattern Recognition Letters v.18 Recent advances in speaker recognition S.Furui https://doi.org/10.1016/S0167-8655(97)00073-1
  11. ICASSP v.1 A parametric approach to vocal tract length normalization E.Edie;H.Gish
  12. ICASSP98 A study on speaker normalization using vocal tract normalization and speaker adaptive training L.Welling;R.Haeb-Umbach;X.Aubert;N.Haberland
  13. IEEE Transactions on Communications v.COM-28 no.1 An algorithm for vector quantizer design Y.Linde;A.Buzo;R.M.Gray
  14. EUROSPEECH Multiresolution channel normalization for ASR in reverberant environments C.Avendano;S.Tibrewala;H.Hermansky