Feature Selection-based Voice Transformation

Lee, Ki-Seung;

doi:10.7776/ASK.2012.31.1.039

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 31 Issue 1
/
Pages.39-50
/
2012
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Feature Selection-based Voice Transformation

단위 선택 기반의 음성 변환

Lee, Ki-Seung (Department of Electronic Engineering, Konkuk University)

이기승

Received : 2011.11.29
Accepted : 2011.12.23
Published : 2012.01.31

https://doi.org/10.7776/ASK.2012.31.1.039 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

A voice transformation (VT) method that can make the utterance of a source speaker mimic that of a target speaker is described. Speaker individuality transformation is achieved by altering three feature parameters, which include the LPC cepstrum, pitch period and gain. The main objective of this study involves construction of an optimal sequence of features selected from a target speaker's database, to maximize both the correlation probabilities between the transformed and the source features and the likelihood of the transformed features with respect to the target model. A set of two-pass conversion rules is proposed, where the feature parameters are first selected from a database then the optimal sequence of the feature parameters is then constructed in the second pass. The conversion rules were developed using a statistical approach that employed a maximum likelihood criterion. In constructing an optimal sequence of the features, a hidden Markov model (HMM) was employed to find the most likely combination of the features with respect to the target speaker's model. The effectiveness of the proposed transformation method was evaluated using objective tests and informal listening tests. We confirmed that the proposed method leads to perceptually more preferred results, compared with the conventional methods.

Keywords

References

M. Abe, S. Nakamura, K. Shikano and H. Kuwabara, "Voice conversion through vector quantization," in Proc. IEEE ICASSP, pp. 565-568, 1988.
M. Savic and I. H. Nam, "Voice personality transformation," Digital Signal Processing, vol. 4, pp. 107- 110, 1991.
H. Valbret, E. Moulines and J. P. Tubach, "Voice transformation using PSOLA technique," Speech Communication, vol. 11, no. 2-3, pp. 175-187, 1992. https://doi.org/10.1016/0167-6393(92)90012-V
H. Mizuno and M. Abe, "Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectral tilt," Speech Communication, vol. 16, no. 2, pp. 153-164, 1995. https://doi.org/10.1016/0167-6393(94)00052-C
M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of formants of voice conversion using artificial neural networks," Speech Communication, vol. 16, no. 2, pp. 207- 216, 1995. https://doi.org/10.1016/0167-6393(94)00058-I
N. Iwahashi and Y. Sagisaka, "Speech spectrum conversion based on speaker interpolation and multifunctional representation with weighting by radial basis function networks," Speech Communication, vol. 16, no. 2, pp. 139-152, 1995. https://doi.org/10.1016/0167-6393(94)00051-B
Y. Stylianou O. Cappe and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 6, no. 2, pp. 131-142, 1998. https://doi.org/10.1109/89.661472
N. Bi and Y. Qi, "Application of speech conversion to alaryngeal speech enhancement," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 5, no. 2, pp. 97-105, 1997. https://doi.org/10.1109/89.554771
L. M. Arslan, "Speaker transformation algorithm using segmental codebooks (STASC)," Speech Communication, vol. 28, no. 28, pp. 211-226, 1999. https://doi.org/10.1016/S0167-6393(99)00015-1
K. S. Lee, D. H. Youn and I. W. Cha, "A New voice personality transformation based on both linear and nonlinear prediction analysis," in Proc. ICSLP, pp. 1401-1404, 1996.
K. S. Lee, D. H. Youn and I. W. Cha, "Voice conversion using a low dimensional vector mapping," IEICE Trans. on Information and System, vol-E85D, no. 8, pp. 1297- 1305, 2002.
K. S. Lee "Statistical approach for voice personality transformation," IEEE Trans. on Audio, Speech and Language processing, vol. 15, no. 2, pp. 641-651, 2007. https://doi.org/10.1109/TASL.2006.876760
Z.-H. Jian and Y. Zhen, "Voice conversion using Viterbi algorithm based on Gaussian mixture model," in Proc. Intelligent Signal Processing and Communication Systems, pp. 32-35, 2007.
D. Sundermann, H. Hoge, A. Bonafonte, H. Ney, A. Black, S. Narayanan, "Text-Independent Voice Conversion Based on Unit Selection," in Proc. IEEE ICASSP, pp. 14-19, 2006.
D. Sundermann, H. Hoge, A. Bonafonte, H. Ney and A. W. Black, "Residual prediction based on unit selection," in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 369-374, 2005.
T. Dutoit, A. Holzapfel, M. Jottrand, A. Moinet, J. Perez and Y. Stylianou, "Towards a Voice Conversion System Based on Frame Selection," in Proc. IEEE ICASSP, pp. 15-20, 2007.
S. J. Cox and J. S. Bridle, "Unsupervised speaker adaptation by probabilistic spectrum fitting," in Proc. IEEE ICASSP, pp. 294-297, 1989.
D. G. Childers, B. Yegnanarayana and Ke Wu, "Voice Conversion: Factors responsible for quality," in Proc. IEEE ICASSP, pp. 748-751, 1985.
Y. Linde, A. Buzo and R. M. Gray, "An algorithm for vector quantizer design," IEEE Trans. on Communications, vol. 28, Issue 1, pp. 84-95, 1980. https://doi.org/10.1109/TCOM.1980.1094577
M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou and A. Syrdal, "The AT&T Next-Gen TTS system," in Proc. Joint Meeting of ASA, EAA, and DAGA, Berlin, Germany, March 1999.
L. R. Rabiner and R. W. Schafer, Digital Processing of speech signals, Prentice-Hall, 1987.
G. M. White and R. B. Neely, "Speech recognition experiments with linear prediction, bandpass filtering, and dynamic programming," IEEE Trans. on Acoustic Speech and Signal Processing, vol. ASSP-24, no. 2, pp. 183-188, 1976.
S. Roucos and A. M. Wilgus, "High quality timescale modification for speech," in Proc. ICASSP 85, pp. 493-469, 1985.
A. Q. Summerfield, "Lipreading and audio-visual speech perception," Philos. Trans. R. Soc. London B, vol. 335, pp. 71-78, 1992. https://doi.org/10.1098/rstb.1992.0009
D. A. Reynolds and R. C. Rose, "Robust textindependent speaker identification using Gaussian mixture speaker models," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 3, no. 1, pp. 72-83, 1995. https://doi.org/10.1109/89.365379

The Journal of the Acoustical Society of Korea (한국음향학회지)

Feature Selection-based Voice Transformation

단위 선택 기반의 음성 변환

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)