• Title/Summary/Keyword: Speech Synthesis

Search Result 381, Processing Time 0.03 seconds

Design and Implementation of a Text-to Speech System using the Prosody and Duration Information (운율 및 길이 정보를 이용한 무제한 음성 합성기의 설계 및 구현)

  • Yang, Jin-Seok;Kim, Jae-Beom;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1121-1129
    • /
    • 1996
  • To produce more natural speech in a Text-to-Speech system, the processing of the prosody and duration must be processing in advance, and then extracted the prosody and duration information by means of trial-and-error experiments. In this paper, a method is proposed to improve the naturalness in a Text-to Speech system using this information. As the results, the Text-to-Speech system proposed and implemented in this paper showed more natural speech synthesis than the systems, which do not use this information, did.

  • PDF

A Speech Translation System for Hotel Reservation (호텔예약을 위한 음성번역시스템)

  • 구명완;김재인;박상규;김우성;장두성;홍영국;장경애;김응인;강용범
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.24-31
    • /
    • 1996
  • In this paper, we present a speech translation system for hotel reservation, KT_STS(Korea Telecom Speech Translation System). KT-STS is a speech-to-speech translation system which translates a spoken utterance in Korean into one in Japanese. The system has been designed around the task of hotel reservation(dialogues between a Korean customer and a hotel reservation de나 in Japan). It consists of a Korean speech recognition system, a Korean-to-Japanese machine translation system and a korean speech synthesis system. The Korean speech recognition system is an HMM(Hidden Markov model)-based speaker-independent, continuous speech recognizer which can recognize about 300 word vocabularies. Bigram language model is used as a forward language model and dependency grammar is used for a backward language model. For machine translation, we use dependency grammar and direct transfer method. And Korean speech synthesizer uses the demiphones as a synthesis unit and the method of periodic waveform analysis and reallocation. KT-STS runs in nearly real time on the SPARC20 workstation with one TMS320C30 DSP board. We have achieved the word recognition rate of 94. 68% and the sentence recognition rate of 82.42% after the speech recognition tests. On Korean-to-Japanese translation tests, we achieved translation success rate of 100%. We had an international joint experiment in which our system was connected with another system developed by KDD in Japan using the leased line.

  • PDF

A Study on Implementation of Emotional Speech Synthesis System using Variable Prosody Model (가변 운율 모델링을 이용한 고음질 감정 음성합성기 구현에 관한 연구)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.8
    • /
    • pp.3992-3998
    • /
    • 2013
  • This paper is related to the method of adding a emotional speech corpus to a high-quality large corpus based speech synthesizer, and generating various synthesized speech. We made the emotional speech corpus as a form which can be used in waveform concatenated speech synthesizer, and have implemented the speech synthesizer that can be generated various synthesized speech through the same synthetic unit selection process of normal speech synthesizer. We used a markup language for emotional input text. Emotional speech is generated when the input text is matched as much as the length of intonation phrase in emotional speech corpus, but in the other case normal speech is generated. The BIs(Break Index) of emotional speech is more irregular than normal speech. Therefore, it becomes difficult to use the BIs generated in a synthesizer as it is. In order to solve this problem we applied the Variable Break[3] modeling. We used the Japanese speech synthesizer for experiment. As a result we obtained the natural emotional synthesized speech using the break prediction module for normal speech synthesize.

Improving LD-CELP using frame classification and modified synthesis filter (프레임 분류와 합성필터의 변형을 이용한 적은 지연을 갖는 음성 부호화기의 성능)

  • 임은희;이주호;김형명
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.6
    • /
    • pp.1430-1437
    • /
    • 1996
  • A low delay code excited linear predictive speech coder(LD-CELP) at bit rates under 8kbps is considered. We try to improve the perfomance of speech coder with frame type dependent modification of synthesis filter. We first classify frames into 3 groups: voiced, unvoiced and onset. For voicedand unvoiced frame, the spectral envelope of the synthesis filter is adapted to the phonetic characteristics. For transition frame from unvoiced to voiced, the synthesis filter which has been interpolated with the bias filter is used. The proposed vocoder produced more clear sound with similar delay level than other pre-existing LD-CELP vocoders.

  • PDF

The Boundary Tones in Korean Intonational Phrases (한국어 억양구의 경계톤)

  • Han, Sun-Hee;Oh, Mi-Ra
    • Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.109-129
    • /
    • 1999
  • A study of boundary tones, which are realized at the final syllable of an Intonational Phrase, is important in that sentential meaning is often differentiated solely by the use of different boundary tones in Korean. The purposes of this paper are three-fold: Firstly, it aims at finding out the different characteristics of boundary tones between designed corpus and natural speech. Secondly, it is to show that gender and dialectal differences are crucial factors in determining different realizations of boundary tones. Finally, this study is to provide a basis for better speech synthesis and speech recognition through the analysis of the morphemes where boundary tones are realized. This study has shown that nine different kinds of boundary tones are realized based on the contextual, gender and dialectal differences. In addition to the boundary tones suggested in Jun (1993), three more boundary toes are introduced: L-%,H-%,LHLH%.

  • PDF

Harmonic Peak Picking-based MVF Estimation for Improvement of HMM-based Speech Synthesis System Using TBE Model (TBE 모델을 사용하는 HMM 기반 음성합성기 성능 향상을 위한 하모닉 선택에 기반한 MVF 예측 방법)

  • Park, Jihoon;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.79-86
    • /
    • 2012
  • In the two-band excitation (TBE) model, maximum voiced frequency (MVF) is the most important feature of the excitation parameter because the synthetic speech quality depends on MVF. Thus, this paper proposes an enhanced MVF estimation scheme based on the peak picking method. In the proposed scheme, the local peak and the peak lobe are picked from the spectrum of a linear predictive residual signal. The normalized distance between neighboring peak lobes is calculated and utilized as a feature to estimate MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves synthetic speech quality compared with that of the conventional one.

ToBI Based Prosodic Representation of the Kyungnam Dialect of Korean

  • Cho, Yong-Hyung
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.159-172
    • /
    • 1997
  • This paper proposes a prosodic representation system of the Kyungnam dialect of Korean, based on the ToBI system. In this system, diverse intonation patterns are transcribed on the four parallel tiers: a tone tier, a break index tier, an orthographic tier, and a miscellaneous tier. The tone tier employs pitch accents, phrase accents, and boundary tones marked with diacritics in order to represent various pitch events. The break index tier uses five break indices, numbered from 0 to 4, in order to represent degrees of connectiveness in speech by associating each inter-word position with a break index. In this, each break index represents a boundary of some kind of constituent. This system can contribute not only to a more detailed theory connecting prosody, syntax, and intonation, but also to current text-to-speech synthesis approaches, speech recognition, and other quantitative computational modellings.

  • PDF