Browse > Article
http://dx.doi.org/10.13064/KSSS.2014.6.4.133

Bilingual Voice Conversion Using Frequency Warping on Formant Space  

Chae, Yi-Geun (공주대학교)
Yun, Young-Sun (한남대학교)
Jung, Jin Man (한남대학교)
Eun, Seongbae (한남대학교)
Publication Information
Phonetics and Speech Sciences / v.6, no.4, 2014 , pp. 133-139 More about this Journal
Abstract
This paper describes several approaches to transform a speaker's individuality to another's individuality using frequency warping between bilingual formant frequencies on different language environments. The proposed methods are simple and intuitive voice conversion algorithms that do not use training data between different languages. The approaches find the warping function from source speaker's frequency to target speaker's frequency on formant space. The formant space comprises four representative monophthongs for each language. The warping functions can be represented by piecewise linear equations, inverse matrix. The used features are pure frequency components including magnitudes, phases, and line spectral frequencies (LSF). The experiments show that the LSF-based voice conversion methods give better performance than other methods.
Keywords
Bilingual Voice Conversion; Formant Space; Frequency Warping; LSF-based Voice Conversion;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Mizuno, H., Abe, M. (1995), Voice conversion algorithm based on peicewise linear conversion rules of formant frequency and spectrum tilt, Speech Communication, no. 16, pp. 153-164.
2 Kuwabara H., Sagisaka Y. (1995), Acoustic characteristics of speaker individuality: Control and conversion, Speech Communication, no. 16, pp. 165-173.
3 Narendranath M., Murthy H. A., Rajendran S., Yegnanarayna B. (1995), Transformation of foramnts for voice conversion using artificial neural networks, Speech Communication, no. 16, pp. 207-216.
4 Sundermann D., Bonafonte A., Ney H. (2004), Time domain vocal tract length normalization, In Proc. of IEEE In. Symposium on Signal Processing and Information Technology, pp. 191-194.
5 Errno D., Moreno A., Bonafonte A. (2010), Voice conversion based on weighted frequency warping, IEEE Tr. on Audio, Speech, and Language Processing, vol. 18, issue 5, pp. 922-1931.   DOI
6 Pye D., Woodland P. C. (1997), Experiments in speaker normalization and adaptation for large vocabulary speech recognition, In Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing, pp. 1047-1050.
7 Sundermann D., Ney H., Hoge H. (2003), VTLN-Based cross-language voice conversion, In Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 676-681.
8 Saheer L., Dines J., Garner P. N. (2012), Vocal tract length normalization for statistical parametric speech synthesis, IEEE Tr. on Audio, Speech, and Language Processing, vol. 20, issue 7, pp. 2134-2148.   DOI
9 Sundermann D., Hoge H., Bonafonte A., Ney H., Black A., Narayanan S. (2006), Text-independent voice conversion based on unit selection. In Proc. of Int. Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 81-84.
10 Huang X., Acero A., Hon H.-W. (2001), Spoken language processing-A guide to theory, algorithm, and system development, Prentice Hall
11 Yun Y.-S., Ladner R. E. (2013), Bilingual voice conversion by weighted frequency warping based on formant space, LNCS 8082, pp. 137-144
12 Sundermann D., Strecha G., Bonafonte A., Hoge H., Ney H. (2005), Evaluation of VTLN-Based voice conversion for embedded speech synthesis, Int. Proc. of Conference on Spoken Language Processing, pp. 3-6.
13 Erro D., Moreno A., Bonafonte A. (2010), Voice conversion based on weighted frequency warping, IEEE Tr. on Audio, Speech, and Language Processing, vol 20. issue 7, pp. 2134-2148
14 Y.-S. Yun (2013), Multilingual voice conversion using direct frequency warping, In Proc. of 2013 Korean Society of Speech Sciences Fall Conference, pp. 127-128 (윤영선 (2013), 주파수 직접 변환에 의한 다국어 음성 변환 연구, 2013 한국음성학회 가을 학술대회 발표 논문집, pp. 127-128)
15 Voiceware Corp., VoiceTextTM, Retrieved from http://www.voiceware.co.kr/kor/product/product1.php on October 31, 2014
16 Jennifer Clyde Interview, Retrieved from http://pann.nate.com/video/211296293 on October 31, 2014
17 Fant G., (1970) Acoustic theory of speech production, Mouton, The Hague