Experiments on Extraction of Non-Parametric Warping Functions for Speaker Normalization

화자 정규화를 위한 비정형 워핑함수 도출에 관한 실험

  • 신옥근 (한국해양대학교 IT공학부)
  • Published : 2005.07.01

Abstract

In this paper. experiments are conducted to extract a set of non-Parametric warping functions to examine the characteristics of the warping among speakers' utterances. For this Purpose. we made use of MFCC and LP spectra of vowels in choosing reference spectrum of each vowel as well as representative spectra of each speaker. These spectra are compared by DTW to give the warping functions of each speaker. The set of warping functions are then defined by clustering the warping functions of all the speakers. Noting that male and female warping functions have shapes similar to Piecewise linear function and Power function respectively, a new hybrid set of warping functions is defined. The effectiveness of the extracted warping functions are evaluated by conducting phone level recognition experiments, and improvements in accuracy rate are observed in both warping functions.

화자들 사이의 워핑특성을 알아보기 위해 비정형 워핑함수를 도출하는 실험을 수행하였다. 이를 위해 모음의 MFCC와 LP 스펙트럼을 이용하여 화자별, 음소별 대표 스펙트럼을 선정한 다음 음소별 기준 스펙트럼을 선택하였다. 기준 스펙트럼과 대표 스펙트럼을 스펙트럼의 전체대역에서 DTW로 비교하여 화자별 워핑함수를 구한 다음, 이들을 clustering함으로써 비정형 워핑함수의 집합을 도출하였다. 이 함수집합에서 남성화자와 여성화자의 함수들이 각각 구간선형함수와 파워함수와 유사함을 관찰할 수 있었으며, 이를 근거로 이 함수들을 조합한 하이브리드 워핑함수집합을 정의하였다. 음소단위의 인식 실험을 통하여 새로 정의된 함수들의 인식률을 시험하였으며 두 함수집합 모두에서 개선된 인식률을 얻을 수 있었다.

Keywords

References

  1. E. Edie and H. Gish, 'A Parametric Approach to Vocal Tract Length Normalization', Proc. ICASSP'96, 346-349, 1996
  2. H. Wakita, 'Normalization of Vowels by Vocal Tract Length and Its Application to Vowel Identification', IEEE. Trans. on ASSP. Vol. ASSP-25, No. 2, 183-192, 1977
  3. S. Umesh. S. V. B. Kumar, M. K. Vinay, R. Shamar and R. Shinha, 'A Simple Approach to Non-Uniform Vowel Normalization,' Proc. ICASSP, 517-520, 2002
  4. P. L. Dognin. 'A Bandpass Transformation for Speaker Normalization', Ph.D. Thesis, University of Pittsburgh, 2003
  5. L. Lee and R. C. Rose, 'A Frequency Warping Approach to Speaker Normalization', IEEE Trans. on Speech and Audio Processing. 6 (1), 49-60, 1998 https://doi.org/10.1109/89.650310
  6. D. Pye and P. C. Woodland, 'Experiments in Speaker Normalization', ICASSP, 1047-1050, 1997
  7. M. Pitz and H. Ney, 'Vocal Tract Normalization as Linear Transformation of MFCC', Proc. EUROSPEECH, 1445-1448, 2003
  8. P. Zhan and Alex Waibel, 'Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition, Language Technologies Institute Technical Report : CMU-LTI-97-150, Carnegie Melon University, May, 1997
  9. P. Zhan and M. Westphal, 'Speaker normalization based on frequency warping', ICASSP-97, Munich, Germany. 1039-1042, 1997
  10. S. Molau, S. Kanthak and H. Ney, 'Efficient Vocal Tract Normalization in Automatic Speech Recognition', Proc. ESSV, 209-216, 2000
  11. 신옥근, '연속음성 인식기를 위한 벡터양자화기 기반의 화자정규화', 한국음향학회지, 제23권 제8호, 583-589, 2004
  12. L. Rabiner and B. Juang, Fundamentals of Speech Recognition, (Parentice Hall, New Jersy, 1993.)
  13. M. A. Bacciani, Speech Recognition System Design Based On Automatically Derived Units, Ph. D. Thesis, (Boston University, 1999.)
  14. S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev and P. Woodland, The HTK Book. ver. 3., Microsoft Corp., 2000
  15. J. S. Garofolo. L. F. Lamel, W. M. Fisher. J. G. Fiscus, D. S. Pallet and N. L. Dahlgren. DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus: CDROM, NIST., 1993