Voice Personality Transformation Using an Optimum Classification and Transformation

최적 분류 변환을 이용한 음성 개성 변환

  • 이기승 (건국대학교 정보통신대학 전자공학부)
  • Published : 2004.07.01

Abstract

In this paper. a voice personality transformation method is proposed. which makes one person's voice sound like another person's voice. To transform the voice personality. vocal tract transfer function is used as a transformation parameter. Comparing with previous methods. the proposed method makes transformed speech closer to target speaker's voice in both subjective and objective points of view. Conversion between vocal tract transfer functions is implemented by classification of entire vector space followed by linear transformation for each cluster. LPC cepstrum is used as a feature parameter. A joint classification and transformation method is proposed, where optimum clusters and transformation matrices are simultaneously estimated in the sense of a minimum mean square error criterion. To evaluate the performance of the proposed method. transformation rules are generated from 150 sentences uttered by three male and on female speakers. These rules are then applied to another 150 sentences uttered by the same speakers. and objective evaluation and subjective listening tests are performed.

본 논문에서는 임의의 화자가 발성한 음성을 다른 화자가 발성한 음성처럼 들리도록 변환하는 음성 변환 알고리즘을 제안하였다. 개인이 지니고 있는 음성의 특성을 변환하기 위해 성도 전달 함수의 특성을 변환 변수로 사용하였으며, 기존의 기법과 비교하여 목표 화자의 음성과 주관적, 객관적으로 더욱 유사한 변환음을 얻기 위한 새로운 방법을 제안하였다. 성도 전달 함수의 변환은 전체 특징 벡터 공간을 분류 한 뒤, 각 구획에 대한 선형 변환식을 통해 구현된다. 특징 변수로서 LPC 켑스트럼을 사용하였으며, 벡터 공간의 분류와 선형 변환식의 추정을 동시에 최적화시키는 분류-변환 알고리즘이 새로이 제안되었다. 제안된 음성 변환 기법의 성능을 평가하기 위해 3명의 남성 화자와 1명의 여성 화자로부터 수집된 약 150개의 문장을 사용하여 변환 규칙을 생성하였으며, 이를 동일한 화자가 발성한 다른 150개의 문장에 대해 적용하여 객관적인 성능 평가와 주관적 청취 테스트를 수행하였다.

Keywords

References

  1. proc. of ICASSP v.1 High quality time-scale modification for speech S. Roucos;A. M. Wilgus
  2. Speech Communication v.9 no.5;6 Pitch Synchronous Waveform Processing Techniques for Text-to-speech Synthesis using Diphones E. Moulines;F.Charpentier https://doi.org/10.1016/0167-6393(90)90021-Z
  3. Speech Communication v.16 no.2 Transformation of formants of voice conversion using artificial neural networks M. Narendranath;H. A. Murthy;S. Rajendran;B. Yegnanarayana https://doi.org/10.1016/0167-6393(94)00058-I
  4. proc. of ICASSP v.1 Voice conversion through vector quantization M. Abe;S. Nakamura;K. Shikano;H. Kuwabara
  5. Speech Communication v.11 Voice transformation using PSOLA technique H. Valbret;E. Moulines;J. P. Tubach https://doi.org/10.1016/0167-6393(92)90012-V
  6. proc. of EUROSPEECH '95 Statistical methods for voice quality transformation Y. Stylianou;O. Cappe;E. Moulines
  7. proc. of ICASSP v.1 Spectral voice conversion for text-to-speech synthesis A. Kain;M. W. Macon
  8. Ph. D Thesis, Electrical Engineering Rensselaer Polytechnic Institute Voice personality transformation Il Hyun Nam
  9. proc. of ICSLP A new voice personality transformation based on both liner and nonlinear prediction analysis K.-S. Lee;D.-H. Youn;I. W. Cha
  10. IEICE Trans. on Information and Systems v.E85-D no.8 Voice conversion using low dimensional vector mapping K.-S. Lee;W.-D.; D.-H. Youn
  11. IEEE Trans. on Acoustic Speech and Signal Processing v.ASSP-24 no.2 Speech recognition experiments with linear prediction bandpass filltering and dynamic programming G. M. White;R. B. Neely
  12. IEEE Trans. on Acoustic Speech and Signal Processing v.1 Efficient vector quantization of LPC parameters at 24 bits/frame K. K. Paliwal;B. S. Atal https://doi.org/10.1109/89.221363