Browse > Article

Voice Personality Transformation Using a Probabilistic Method  

Lee Ki-Seung (건국대학교 정보 통신 대학 전자 공학부)
Abstract
This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.
Keywords
Voice transformation; Maximum Likelihood Estimation;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 L. R. Rabiner and R. W. Schafer, Digital Processing of speech signals, (Prentice-Hall, 1987)
2 H. Valbret, E. Moulines, and J. P. Tubach, 'Voice transformation using PSOLA technique,' Speech Communication, 11, 175-187, 1992   DOI   ScienceOn
3 Y. Stylianou O. Cappe and E. Moulines, 'Statistical methods for voice quality transformation,' proc. of EUROSPEECH '95, Madrid, 447-450, 1995
4 이기승, '다중 응답 분류회귀트리를 이용한 음성 개성 변환,' 한국음향학회지, 23 (3), 253-261, 2004년 4월
5 이기승, '최적 분류 변환을 이용한 음성 개성 변환' 한국음향학회지, 23 (5), 400-409, 2004년 7월
6 S. Roucos and A. M. Wilgus, 'High quality time-scale modification for speech,' proc. of ICASSP, 1, 493-469, 1985
7 Y. Linde, A. Buzo, and R. M. Gray, 'An algorithm for vector quantizer design,' IEEE Trans. on Communications, 28, 84-95, Jan., 1980   DOI
8 R. W. Dubnowski, R. W. Schafer and L. R. Rabiner, 'Real-time digital hardware pitch detector,' IEEE Trans. on Acoustic, Speech and Signal Processing, ASSP-24 (1), 2-8, Feb. 1976
9 A. Kain and M. W. Macon, 'Spectral voice conversion for text-to-speech synthesis,' proc. of ICASSP, 1, 285-288, 1998
10 L. M. Arslan, 'Speaker transformation algorithm using segmental codebooks (STASC),' Speech Communication, 28, 211-226, 1999   DOI   ScienceOn
11 E. Moulines and F. Charpentier, 'Pitch Synchronous Waveform Processing Techniques for Text-to-speech Synthesis using Diphones,' Speech Communication, 9 (5/6), 453-467, 1990   DOI   ScienceOn
12 H. L. Van Trees, Detection, Estimation and Modulation Theory, (Part I), (Wiley, New York, 1968)
13 M. Abe, S. Nakamura, K. Shikano and H. Kuwabara, 'Voice conversion through vector quantization,' proc. of ICASSP, 1, 565-568, 1988
14 G. M. White and R. B. Neely, 'Speech recognition experiments with linear prediction, bandpass filtering, and dynamic programming,' IEEE Trans. on Acoustic Speech and Signal Processing, ASSP-24 (2), 183-188, Apr, 1976
15 A. Dempster, N. Laird and D. Rubin, 'Maximum likelihood from incomplete data via the EM algorithm,' J. Royal Stat. Soc., 39, 1-38, 1977