• Title/Summary/Keyword: Speech Synthesis

Search Result 381, Processing Time 0.024 seconds

A Study on Multi-Pulse Speech Coding Method by Using Individual Pitch Information (개별 피치정보를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

  • Lee, See-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.2
    • /
    • pp.59-64
    • /
    • 2006
  • In this paper, 1 propose a new method of Multi-Pulse Coding(IP-MPC) use individual pitch pulses in order to accommodate the changes in each pitch interval and reduce pitch errors. The extraction rate of individual pitch pulses was $85\%$ for female voice and $96\%$ for male voice respectively, 1 evaluate the MPC by using pitch information of autocorrelation method and the IP-MPC by using individual pitch pulses. As a result, 1 knew that synthesis speech of the IP-MPC was better in speech quality than synthesis speech of the MPC.

  • PDF

A car number retrieving system using speech recognition for PDA (PDA상에서 음성인식을 이용한 차량번호 조회시스템)

  • 김우성;김동환;윤재선;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2001.06a
    • /
    • pp.281-284
    • /
    • 2001
  • In this paper, we present a car number retrieving system using speech recogntion and speech synthesis for PDA. This system consist of 4-digit numbers and command speech recognition as well its speech synthesis. Experiment results showed 4-digit numbers recognition rate 97% and commands recognition 99% through speaker-independent method.

  • PDF

Computerization and Application of the Korean Standard Pronunciation Rules (한국어 표준발음법의 전산화 및 응용)

  • 이계영;임재걸
    • Language and Information
    • /
    • v.7 no.2
    • /
    • pp.81-101
    • /
    • 2003
  • This paper introduces a computerized version of the Korean Standard Pronunciation Rules that can be used in speech engineering systems such as Korean speech synthesis and recognition systems. For this purpose, we build Petri net models for each item of the Standard Pronunciation Rules, and then integrate them into the sound conversion table. The reversion of the Korean Standard Pronunciation Rules regulates the way of matching sounds into grammatically correct written characters. This paper presents not only the sound conversion table but also the character conversion table obtained by reversely converting the sound conversion table. Malting use of these tables, we have implemented a Korean character into a sound system and a Korean sound into the character conversion system, and tested them with various data sets reflecting all the items of the Standard Pronunciation Rules to verify the soundness and completeness of our tables. The test results show that the tables improve the process speed in addition to the soundness and completeness.

  • PDF

Speech syntheis engine for TTS (TTS 적용을 위한 음성합성엔진)

  • 이희만;김지영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.6
    • /
    • pp.1443-1453
    • /
    • 1998
  • This paper presents the speech synthesis engine that converts the character strings kept in a computer memory into the synthesized speech sounds with enhancing the intelligibility and the naturalness by adapting the waveform processing method. The speech engine using demisyllable speech segments receives command streams for pitch modification, duration and energy control. The command based engine isolates the high level processing of text normalization, letter-to-sound and the lexical analysis and the low level processing of signal filtering and pitch processing. The TTS(Text-to-Speech) system implemented by using the speech synthesis engine has three independent object modules of the Text-Normalizer, the Commander and the said Speech Synthesis Engine those of which are easily replaced by other compatible modules. The architecture separating the high level and the low level processing has the advantage of the expandibility and the portability because of the mix-and-match nature.

  • PDF

A study on Speech Coding Method using V/S/TSIUVC Switching (V/S/TSIUVC 스위칭을 이용한 음성부호화 방식에 관한 연구)

  • Lee, See-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.6
    • /
    • pp.1180-1184
    • /
    • 2006
  • In a speech coding system using excitation source of voiced and unvoiced, it would be a distortion of speech quality in a voiced and an unvoiced consonants in a frame. In this paper, I propose a new multi-pulse coding method make use of V/S/TSIUVC switching and TSIUVC approximation-synthesis method in order to restrict a distortion of speech quality. The TSIUVC is extracted by using the zero crossing rate and individual pitch pulse. And the TSIUVC extraction rate was 91% for female voice and 96.2% for male voice. The important thing is that the frequency information of 0.547kHz below and 2.813kHz above can be made with high quality synthesis waveform within TSIUVC. I evaluated the MPC of V/UV and FBD-MPC of V/S/TSIUVC. As a result, the synthesis speech of FBD-MPC was better in speech quality than the MPC.

  • PDF

Synthesis and Evaluation of Prosodically Exaggerated Utterances

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.73-85
    • /
    • 2009
  • This paper introduces the technique of synthesizing and evaluating human utterances with exaggerated or atypical prosody. Prosody exaggeration can be implemented by manipulating either the fundamental frequency (F0) contour, the segmental durations, or the intensity contour of an utterance. Of these three prosodic elements, two or more can be exaggerated at the same time. The algorithms of synthesis and evaluation were suggested. Learner utterances exaggerated in each of the three prosodic features were evaluated with respect to their original native versions in terms of the differences in their F0 contours, the segmental durations, and the intensity contours. The measure of differences was the Euclidean distance metric between the matching points in their F0 and intensity contours. The measure was calculated after the exaggerated learner utterances were aligned by the segments and rendered identical to their native version in terms of their segmental durations. For the evaluation of the segmental durations, no prior modifications were made in durations and the same measure was used. The results from the pilot experiment suggest the viability of this measure in the evaluation of learner utterances with atypical prosody with respect to their native versions.

  • PDF

The Role of Prosody in Dialect Synthesis and Authentication

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.25-31
    • /
    • 2009
  • The purpose of this paper is to examine the viability of synthesizing Masan dialect with Seoul dialect and to examine the role of prosody in the authentication of the synthesized Masan dialect. The synthesis was performed by transferring one or more of the prosodic features of the Masan utterance onto the Seoul utterance. The hypothesis is that, given an utterance composed of the phonemes shared by both dialects, as more prosodic features of the Masan utterance are transferred onto the Seoul utterance, the Seoul utterance will be identified as more authentic Masan utterance. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The synthesized Masan utterances were evaluated by thirteen native speakers of Masan dialect. The result showed that the fundamental frequency contour and the segmental durations had main effects on the perceptual shift from Seoul to Masan dialect.

  • PDF

Overlap and Add Sinusoidal Synthesis Method of Speech Signal Using Phase Shaping Factor (위상 변환 인자가 적용된 음성의 중첩합산 정현파 합성 방법)

  • Park, Jong-Bae;Kim, Jong-Hark;Kim, Kyu-Jin;Yang, Yong-Ho;Lee, In-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.409-410
    • /
    • 2007
  • In this paper, we propose a new method for overlap and add synthesis using phase shaping factor in a sinusoidal synthesis method of speech signal, which improves continuity and SNR(Signal Noise Ratio) efficiency of synthesized speech.

  • PDF