• Title/Summary/Keyword: Synthetic Speech

Search Result 84, Processing Time 0.026 seconds

A Unit Selection Methods using Variable Break in a Japanese TTS (일본어 TTS의 가변 Break를 이용한 합성단위 선택 방법)

  • Na, Deok-Su;Bae, Myung-Jin
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.983-984
    • /
    • 2008
  • This paper proposes a variable break that can offset prediction error as well as a pre-selection methods, based on the variable break, for enhanced unit selection. In Japanese, a sentence consists of several APs (Accentual phrases) and MPs (Major phrases), and the breaks between these phrases must predicted to realize text-to-speech systems. An MP also consists of several APs and plays a decisive role in making synthetic speech natural and understandable because short pauses appear at its boundary. The variable break is defined as a break that is able to change easily from an AP to an MP boundary, or from an MP to an AP boundary. Using CART (Classification and Regression Trees), the variable break is modeled stochastically, and then we pre-select candidate units in the unit-selection process. As the experimental results show, it was possible to complement a break prediction error and improve the naturalness of synthetic speech.

  • PDF

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

Design of a Low Bit-rate Speech Coder Based on Mixed Multi-band Excitation Model (혼합 다중대역 여기모델에 기반한 저 전송률 음성 부호화기의 설계)

  • 한우진;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.510-521
    • /
    • 2002
  • MBE (multi-band excitation) coder can achieve high qualify synthetic speech below 4.0 kbps. There are, however, significant differences of the fine structure between the original spectrum and the synthetic spectrum. They are mainly due to the exclusive partition of voiced and unvoiced regions in frequency domain and the decision procedure based on the experimental threshold. This paper proposes MMBE (mixed multi-band excitation) speech model to overcome drawbacks of a MBE coder. In addition, two analysis methods, which do not need my decision procedure based on a threshold, are presented. Both voiced and unvoiced components can be mixed over all the frequency axis in the MMBE speech model. To illustrate the potential of the proposed speech model, we develop a 2.6 kbps MMBE coder and compare it with a 2.9 kbps MBE coder by both objective and subjective methods. The results have shown that the proposed coder has a better performance even at a lower bit-rate compared with the MBE coder.

Synthetic Speech Quality Improvement By Glottal parameter Interpolation - Preliminary study on open quotient interpolation in the speech corpus - (성대특성 보간에 의한 합성음의 음질향상 - 음성코퍼스 내 개구간 비 보간을 위한 기초연구 -)

  • Bae, Jae-Hyun;Oh, Yung-Hwa
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.63-66
    • /
    • 2005
  • For the Large Corpus based TTS the consistency of the speech corpus is very important. It is because the inconsistency of the speech quality in the corpus may result in a distortion at the concatenation point. And because of this inconsistency, large corpus must be tuned repeatedly One of the reasons for the inconsistency of the speech corpus is the different glottal characteristics of the speech sentence in the corpus. In this paper, we adjusted the glottal characteristics of the speech in the corpus to prevent this distortion. And the experimental results are showed.

  • PDF

On a Pitch Alteration Technique in Time-Frequency Hybrid Domain for High Quality Prosody Control of Speech Signal (고음질 운율조절용 시간-주파수 혼성영역 피치변경법)

  • Lee, Sang-Hyo;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.106-109
    • /
    • 1997
  • In the area of the speech synthesis techniques, the waveform coding methods maintain the intelligibility and naturalness of synthetic speech. In order to apply the waveform coding techniques to synthesis by rule, however, we must be able to alter the pitches for prosody control of synthetic speech. In this paper, we propose a new pitch alteration technique in time-frequency hybrid domain, that compensates phase distortion of the cepstral pitch alteration method with time scaling method in the time domain. This method can remove some phase spectrum distortion which is occurred in conjunction point between the waveforms in continued frames. Also, we can obtain little magnitude spectrum distortion below 1.18% for pitch alteration of 200%.

  • PDF

Complexity Reduction Algorithm of Speech Coder(EVRC) for CDMA Digital Cellular System

  • Min, So-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.12
    • /
    • pp.1551-1558
    • /
    • 2007
  • The standard of evaluating function of speech coder for mobile telecommunication can be shown in channel capacity, noise immunity, encryption, complexity and encoding delay largely. This study is an algorithm to reduce complexity applying to CDMA(Code Division Multiple Access) mobile telecommunication system, which has a benefit of keeping the existing advantage of telecommunication quality and low transmission rate. This paper has an objective to reduce the computing complexity by controlling the frequency band nonuniform during the changing process of LSP(Line Spectrum Pairs) parameters from LPC(Line Predictive Coding) coefficients used for EVRC(Enhanced Variable-Rate Coder, IS-127) speech coders. Its experimental result showed that when comparing the speech coder applied by the proposed algorithm with the existing EVRC speech coder, it's decreased by 45% at average. Also, the values of LSP parameters, Synthetic speech signal and Spectrogram test result were obtained same as the existing method.

  • PDF

SPEECH SYNTHESIS USING LARGE SPEECH DATA-BASE

  • Lee, Kyu-Keon;Mochida, Takemi;Sakurai, Naohiro;Shirai, Katasuhiko
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.949-956
    • /
    • 1994
  • In this paper, we introduce a new speech synthesis method for Japanese and Korean arbitrary sentences using the natural speech data-base. Also, application of this method to a CAI system is discussed. In our synthesis method, a basic sentence and basic accent-phrases are selected from the data-base against a target sentence. Factors for those selections are phrase dependency structure (separation degree), number of morae, type of accent and phonemic labels. The target pitch pattern and phonemic parameter series are generated using those selected basic units. As the pitch pattern is generated using patterns which are directly extracted form real speech, it is expected to be more natural than any other pattern which is estimated by any model. Until now, we have examined this method on Japanese sentence speech and affirmed that the synthetic sound preserves human-like features fairly well. Now we extend this method to Korean sentence speech synthesis. Further more, we are trying to apply this synthesis unit to a CAI system.

  • PDF

Speech Synthesis Algorithm Using Mixed Phase Information for TTS Systems (혼합 위상 정보를 이용한 TTS 합성음 생성 알고리즘)

  • Kwon, Chul-Hong;Lee, Min-Kyu
    • Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.35-43
    • /
    • 2001
  • New speech synthesis algorithms capable of flexible prosody (especially F0) modification are desired for a high quality TTS system. TD-PSOLA is the most popular synthesis algorithm. The algorithm shows very high quality when F0 modification is limited. However, the quality degradation due to pitch epoch detection error becomes severe as the F0 modification factor becomes large. On the other hand, the vocoder framework is very flexible in F0 manipulation. The synthesized speech quality from the vocoder is far from natural human speech and suffers from buzziness. To remedy the buzzy quality from the vocoder and make more natural synthetic speech, we propose a mixed phase vocoder.

  • PDF

Implementation of Formant Speech Analysis/Synthesis System (포만트 분석/합성 시스템 구현)

  • Lee, Joon-Woo;Son, Ill-Kwon;Bae, Keuo-Sung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.295-314
    • /
    • 1997
  • In this study, we will implement a flexible formant analysis and synthesis system. In the analysis part, the two-channel (i.e., speech & EGG signals) approach is investigated for accurate estimation of formant information. The EGG signal is used for extracting exact pitch information that is needed for the pitch synchronous LPC analysis and closed phase LPC analysis. In the synthesis part, Klatt formant synthesizer is modified so that the user can change synthesis parameters arbitarily. Experimental results demonstrate the superiority of the two-channel analysis method over the one-channel(speech signal only) method in analysis as well as in synthesis. The implemented system is expected to be very helpful for studing the effects of synthesis parameters on the quality of synthetic speech and for the development of Korean text-to-speech(TTS) system with the formant synthesis method.

  • PDF

Variable Rate IMBE-LP Coding Algorithm Using Band Information (주파수대역 정보를 이용한 가변률 IMBE-LP 음성부호화 알고리즘)

  • Park, Man-Ho;Bae, Geon-Seong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.5
    • /
    • pp.576-582
    • /
    • 2001
  • The Multi-Band Excitation(MBE) speech coder uses a different approach for the representation of the excitation signal. It replaces the frame-based single voiced/unvoiced classification of a classical speech coder with a set of such decision over harmonic intervals in the frequency domain. This enables each speech segment to be a mixture of voiced and unvoiced, and improves the synthetic speech quality by reducing decision errors that might occur on the frame-based single voiced and unvoiced decision process when input speech is degraded with noise. The IMBE-LP, improved version of MBE with linear prediction, represents the spectral information of MBE model with linear prediction coefficients to obtain low bit rate of 2.4 kbps. In this Paper, we proposed a variable rate IMBE-LP vocoder that has lower bit rate than IMBE-LP without degrading the synthetic speech quality. To determine the LP order, it uses the spectral band information of the MBE model that has something to do with he input speech's characteristics. Experimental results are riven with our findings and discussions.

  • PDF