• Title/Summary/Keyword: PSOLA

Search Result 33, Processing Time 0.018 seconds

Development of Text-to-Speech System for PC (PC용 Text-to-Speech 시스템 개발)

  • Choi Muyeol;Hwang Cholgyu;Kim Soontae;Kim Junggon;Yi Sopae;Jang Seokbok;Pyo Kyungnan;Ahn Hyesun;Kim Hyung Soon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.41-44
    • /
    • 1999
  • 본 논문에서는 PC 응용을 위한 고음질의 한국어 text-to-speech(TTS) 합성 시스템을 개발하였다. 개발된 시스템의 합성방식으로는 음의 고저 조절, 인접음 사이의 연결 처리 및 음색제어 등에서 기존의 PSOLA 방식에 비해 장점을 가지는 정현파 모델 기반의 방식을 채택하였고, 자연스러운 운율 모델링을 위하여 통계적 기법중의 하나인 Classification and regression tree(CART) 방법을 사용하였다. 또한 음소 경계의 불연속성 문제를 줄이기 위한 합성단위로 초성-중성 및 종성 단위를 사용하였고, 다양한 음색표현이 가능하도록 음색제어 기능을 갖추었다. 그리고, 표준 Speech Application Program Interface(SAPI)를 준용한 TTS engine 형태로 구현함으로써 PC 상에서의 응용 프로그램 개발 편의성을 높였다. 합성음의 청취평가 결과 음질의 우수성 및 음색제어 기능의 유효성을 확인할 수 있었다.

  • PDF

Synchronizationof Synthetic Facial Image Sequences and Synthetic Speech for Virtual Reality (가상현실을 위한 합성얼굴 동영상과 합성음성의 동기구현)

  • 최장석;이기영
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.7
    • /
    • pp.95-102
    • /
    • 1998
  • This paper proposes a synchronization method of synthetic facial iamge sequences and synthetic speech. The LP-PSOLA synthesizes the speech for each demi-syllable. We provide the 3,040 demi-syllables for unlimited synthesis of the Korean speech. For synthesis of the Facial image sequences, the paper defines the total 11 fundermental patterns for the lip shapes of the Korean consonants and vowels. The fundermental lip shapes allow us to pronounce all Korean sentences. Image synthesis method assigns the fundermental lip shapes to the key frames according to the initial, the middle and the final sound of each syllable in korean input text. The method interpolates the naturally changing lip shapes in inbetween frames. The number of the inbetween frames is estimated from the duration time of each syllable of the synthetic speech. The estimation accomplishes synchronization of the facial image sequences and speech. In speech synthesis, disk memory is required to store 3,040 demi-syllable. In synthesis of the facial image sequences, however, the disk memory is required to store only one image, because all frames are synthesized from the neutral face. Above method realizes synchronization of system which can real the Korean sentences with the synthetic speech and the synthetic facial iage sequences.

  • PDF

A Study on Pitch Perception of Normal Korean (한국 성인 음성의 음도인식에 관한 연구)

  • Jeong, Ok-Ran;Kim, Hyung-Soon;Kim, Young-Tae;Sub, Jang-Su
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.315-323
    • /
    • 1997
  • This study attempts to determine the fundamental frequency level of male and female voices that Koreans perceive as normal. Seventy-three college students majoring in Speech Pathology participated in the study on a voluntary basis. The subjects listened to a male voice with fundamental frequency of 60 Hz, 80 Hz, 100 Hz, 120 Hz, 140 Hz, 160 Hz, 180 Hz, and 200 Hz, and a female voice with fundamental frequency of 140 Hz, 160 Hz, 180 Hz, 200 Hz, 220 Hz, 240 Hz, 260 Hz, and 280 Hz. The PSOLA (Pitch Synchronous Overlap). method and harmonic modeling method of speech signal were used to change pitch in the 20 Hz interval. The voices were presented in a random order to prevent listener bias. The results were as follows; Firstly, $46.6\%$ judged male voice with 120 Hz as normal, and $19.2\%$ judged 140 Hz as normal, and another $19.2\%$ judged 160 Hz as normal. Secondly, $50.7\%$ perceived female voice with 220 Hz as normal, and $32.9\%\;and\;30.1\%$ responded to 200 Hz and 240 Hz, respectively. The problems and recommendations for a future investigation are discussed.

  • PDF