[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7776/ASK.2012.31.1.039

Feature Selection-based Voice Transformation 단위 선택 기반의 음성 변환

Lee, Ki-Seung (Department of Electronic Engineering, Konkuk University)

Publication Information

The Journal of the Acoustical Society of Korea / v.31, no.1, 2012 , pp. 39-50 More about this Journal

Abstract

A voice transformation (VT) method that can make the utterance of a source speaker mimic that of a target speaker is described. Speaker individuality transformation is achieved by altering three feature parameters, which include the LPC cepstrum, pitch period and gain. The main objective of this study involves construction of an optimal sequence of features selected from a target speaker's database, to maximize both the correlation probabilities between the transformed and the source features and the likelihood of the transformed features with respect to the target model. A set of two-pass conversion rules is proposed, where the feature parameters are first selected from a database then the optimal sequence of the feature parameters is then constructed in the second pass. The conversion rules were developed using a statistical approach that employed a maximum likelihood criterion. In constructing an optimal sequence of the features, a hidden Markov model (HMM) was employed to find the most likely combination of the features with respect to the target speaker's model. The effectiveness of the proposed transformation method was evaluated using objective tests and informal listening tests. We confirmed that the proposed method leads to perceptually more preferred results, compared with the conventional methods.

Keywords

Voice conversion; Unit selection; Hidden markov model;

Citations & Related Records

Reference

1	S. J. Cox and J. S. Bridle, "Unsupervised speaker adaptation by probabilistic spectrum fitting," in Proc. IEEE ICASSP, pp. 294-297, 1989.
2	D. G. Childers, B. Yegnanarayana and Ke Wu, "Voice Conversion: Factors responsible for quality," in Proc. IEEE ICASSP, pp. 748-751, 1985.
3	Y. Linde, A. Buzo and R. M. Gray, "An algorithm for vector quantizer design," IEEE Trans. on Communications, vol. 28, Issue 1, pp. 84-95, 1980. DOI
4	M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou and A. Syrdal, "The AT&T Next-Gen TTS system," in Proc. Joint Meeting of ASA, EAA, and DAGA, Berlin, Germany, March 1999.
5	L. R. Rabiner and R. W. Schafer, Digital Processing of speech signals, Prentice-Hall, 1987.
6	G. M. White and R. B. Neely, "Speech recognition experiments with linear prediction, bandpass filtering, and dynamic programming," IEEE Trans. on Acoustic Speech and Signal Processing, vol. ASSP-24, no. 2, pp. 183-188, 1976.
7	S. Roucos and A. M. Wilgus, "High quality timescale modification for speech," in Proc. ICASSP 85, pp. 493-469, 1985.
8	A. Q. Summerfield, "Lipreading and audio-visual speech perception," Philos. Trans. R. Soc. London B, vol. 335, pp. 71-78, 1992. DOI ScienceOn
9	D. A. Reynolds and R. C. Rose, "Robust textindependent speaker identification using Gaussian mixture speaker models," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 3, no. 1, pp. 72-83, 1995. DOI ScienceOn
10	Y. Stylianou O. Cappe and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 6, no. 2, pp. 131-142, 1998. DOI ScienceOn
11	N. Bi and Y. Qi, "Application of speech conversion to alaryngeal speech enhancement," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 5, no. 2, pp. 97-105, 1997. DOI ScienceOn
12	L. M. Arslan, "Speaker transformation algorithm using segmental codebooks (STASC)," Speech Communication, vol. 28, no. 28, pp. 211-226, 1999. DOI ScienceOn
13	K. S. Lee, D. H. Youn and I. W. Cha, "A New voice personality transformation based on both linear and nonlinear prediction analysis," in Proc. ICSLP, pp. 1401-1404, 1996.
14	K. S. Lee, D. H. Youn and I. W. Cha, "Voice conversion using a low dimensional vector mapping," IEICE Trans. on Information and System, vol-E85D, no. 8, pp. 1297- 1305, 2002.
15	K. S. Lee "Statistical approach for voice personality transformation," IEEE Trans. on Audio, Speech and Language processing, vol. 15, no. 2, pp. 641-651, 2007. DOI ScienceOn
16	Z.-H. Jian and Y. Zhen, "Voice conversion using Viterbi algorithm based on Gaussian mixture model," in Proc. Intelligent Signal Processing and Communication Systems, pp. 32-35, 2007.
17	D. Sundermann, H. Hoge, A. Bonafonte, H. Ney, A. Black, S. Narayanan, "Text-Independent Voice Conversion Based on Unit Selection," in Proc. IEEE ICASSP, pp. 14-19, 2006.
18	D. Sundermann, H. Hoge, A. Bonafonte, H. Ney and A. W. Black, "Residual prediction based on unit selection," in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 369-374, 2005.
19	T. Dutoit, A. Holzapfel, M. Jottrand, A. Moinet, J. Perez and Y. Stylianou, "Towards a Voice Conversion System Based on Frame Selection," in Proc. IEEE ICASSP, pp. 15-20, 2007.
20	M. Abe, S. Nakamura, K. Shikano and H. Kuwabara, "Voice conversion through vector quantization," in Proc. IEEE ICASSP, pp. 565-568, 1988.
21	M. Savic and I. H. Nam, "Voice personality transformation," Digital Signal Processing, vol. 4, pp. 107- 110, 1991.
22	H. Valbret, E. Moulines and J. P. Tubach, "Voice transformation using PSOLA technique," Speech Communication, vol. 11, no. 2-3, pp. 175-187, 1992. DOI ScienceOn
23	H. Mizuno and M. Abe, "Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectral tilt," Speech Communication, vol. 16, no. 2, pp. 153-164, 1995. DOI ScienceOn
24	M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of formants of voice conversion using artificial neural networks," Speech Communication, vol. 16, no. 2, pp. 207- 216, 1995. DOI ScienceOn
25	N. Iwahashi and Y. Sagisaka, "Speech spectrum conversion based on speaker interpolation and multifunctional representation with weighting by radial basis function networks," Speech Communication, vol. 16, no. 2, pp. 139-152, 1995. DOI ScienceOn