• Title/Summary/Keyword: Speech Synthesis

Search Result 381, Processing Time 0.023 seconds

An HMM-based Korean TTS synthesis system using phrase information (운율 경계 정보를 이용한 HMM 기반의 한국어 음성합성 시스템)

  • Joo, Young-Seon;Jung, Chi-Sang;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.89-91
    • /
    • 2011
  • In this paper, phrase boundaries in sentence are predicted and a phrase break information is applied to an HMM-based Korean Text-to-Speech synthesis system. Synthesis with phrase break information increases a naturalness of the synthetic speech and an understanding of sentences. To predict these phrase boundaries, context-dependent information like forward/backward POS(Part-of-Speech) of eojeol, a position of eojeol in a sentence, length of eojeol, and presence or absence of punctuation marks are used. The experimental results show that the naturalness of synthetic speech with phrase break information increases.

  • PDF

Development of an algorithm for the control of prosodic factors to synthesize unlimited isolated words in the time domain (시간 영역에서의 무제한 고립어 합성을 위한 운율 요소 제어용 알고리즘 개발)

  • 강찬희
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.7
    • /
    • pp.59-68
    • /
    • 1998
  • This paper is to develop an algorithm for the unlimited korean speech synthesis. We present the results controlled of prosodic factors with isolated words as aynthesis basis unit int he time domain. With a new pitch-synchronous and parametric speech synthesis mehtod in the time domain here we mainly present the results of controlled prosody factors such a spitch periods, energy envelops and durations and the evaluaton of synthetic speech qualities. In the case of synthesis, it is possible ot synthesize connected words by controlling of a continuous unified prosody that makes to improve the naturalities. In the results of experiment, it also has been to be improved uncontinuities of pitch and zeroing of energy in the junction parts of speech waveforms. Specially it has been to be possible to synthesize speeches with unlimitted durations and tones. So on it makes the noisiness and the clearness better by improving the degradation effects from the phase distortion due to the discontinuities in the waveform connection parts.

  • PDF

On a Cepstral Pitch Alteration Technique for Prosody Control in the Speech Synthesis System with High Quality

  • Kim, Kyu-Hong;Baek, Seong-Joon;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1E
    • /
    • pp.32-36
    • /
    • 1999
  • In the area of the speech synthesis techniques, the waveform coding methods maintain the intelligibility and naturalness of synthetic speech. In order to apply the waveform coding techniques to synthesis by rule, we must be able to alter the pitches of synthetic speech. In this paper, we propose a new pitch altering method that compensates phase distortion of the cepstral pitch alteration method with time scaling method in the time domain. This method can remove some spectrum distortion which is occurred in conjunction point between the waveforms. For performance test the spectrum distortion rate was used as objective criterion and the MOS(Mean Opinion Score) was used as subjective criterion. As a result, the spectrum distortion and MOS are obtained by 0.66% and 3.9, respectively.

  • PDF

Synthesis-by-rule of Korean: Part II - Speech Synthesis Using the Units of Demisyllables (우리말 규칙합성에 관한 연구 (II) - 반음절 단위의 음성합성)

  • Cheon, Kang-Sik;Lee, Sung-Jun;Lee, Jae-Hong
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.29-32
    • /
    • 1988
  • A new set of the units of demi-syllables is presented for Korean speech synthesis. The performance of the set of demi-syllable units is compared with that of the set of syllable units in the aspects of the quality of synthesized speech using each set of the units and the size of the computer memory which each set of units occupies. The set of demi-syllable units achieves comparable speech quality and occupies smaller memory size than the set of syllable units.

  • PDF

On a Pitch Alteration Method using Scaling the Harmonics Compensated with the Phase for Speech Synthesis (위상 보상된 고조파 스케일링에 의한 음성합성용 피치변경법)

  • Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.91-97
    • /
    • 1994
  • In speech processing, the waveform codings are concerned with simply preserving the waveform of signal through a redundancy reduction process. In the case of speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. Because the parameters of this coding are not classified as both excitation and vocal tract, it is difficult to apply the waveform coding to the synthesis by rule. Thus, in order to apply the waveform coding to synthesis by rule, it is necessary to alter the pitches. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by dividing the speech signals into the vocal tract and excitation parameters. This method is a time-frequency domain method preserving the phase component of the waveform in time domain and the magnitude component in frequency domain. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing. In case of using the algorithm, we can obtain spectrum distortion with $2.94\%$. That is, the spectrum distortion is decreased more $5.06\%$ than that of the pitch alteration method in time domain.

  • PDF

On a Pitch Alteration Method Compensated with the Spectrum for High Quality Speech Synthesis (스펙트럼 보상된 고음질 합성용 피치 변경법)

  • 문효정
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.123-126
    • /
    • 1995
  • The waveform coding are concerned with simply preserving the wave shape of speech signal through a redundancy reduction process. In the case of speech synthesis, the wave form coding with high quality are mainly used to the synthesis by analysis. However, because the parameters of this coding are not classified as either excitation and vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by using scaling the time-axis and compensating the spectrum. This is a time-frequency domain method that is preserved in the phase components of the waveform and that has a little spectrum distortion with 2.5% and less for 50% pitch change.

  • PDF

A Study on the Human Auditory Scaling (인간의 청각 척도에 관한 고찰)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.125-134
    • /
    • 1997
  • Human beings can perceive various aspects of sound including loudness, pitch, length, and timber. Recently many studies were conducted to clarify complex auditory scales of the human ear. This study critically reviews some of these scales (decibel, sone, phon for loudness perception; mel and bark for pitch) and proposes to apply the scales to normalize acoustic correlates of human speech. One of the most important aspects of human auditory perception is the nonlinearity which should be incorporated into the linear speech analysis and synthesis system. Further studies using more sophisticated equipment are desirable to refine these scales, through the analysis of human auditory perception of complex tones or speech. This will lead scientists to develop better speech recognition and synthesis devices.

  • PDF

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

On a Pitch Alteration Technique in the V/UV Spectrum for High Quality Speech Synthesis Technique (고음질 합성방식용 V/UV 스펙트럼상의 피치변경법에 관한 연구)

  • Jo, Wang-Rae;Bae, Myung-Jin;Kim, Dong-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.99-103
    • /
    • 1996
  • Most waveform coding techniques attempt to reduce redundancy of speech signal while preserving the shape of the waveform. In speech synthesis, wavefrom coding methods are used to the synthesis by rule for high quality speech. However, it is difficult to apply the waveform coding to the synthesis by rule because the parameters of the wavefrom coding cannot be classified as either the excitation or the vocal tract parameters. The proposed method shows little spectrum distortion of 2.7% or less for 50% pitch changes. It also achieves smooth connection of wavefrom magnitudes among the frames by compensating the phase in time domain.

  • PDF

On a Pitch Change of the Waveform Coding by the Cepstrum Analysis of Speech Waveforms (켑스트럼 분석에 의한 파형부호화의 피치변경에 관한 연구)

  • Bae, Myung-Jin;Lee, Mi-Suk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.4
    • /
    • pp.14-21
    • /
    • 1992
  • The waveform coding is concerned with simply preserving the wave shape of speech signal through a redundancy reduction process. In area of the speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. However, because the parameters of this coding are not classified as either excitation parameters and vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In this paper, we proposed a new pitch alternation method that can change the pitch periods in the waveform coding by using the cepstrum analysis. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing.

  • PDF