• Title/Summary/Keyword: Synthetic Speech

Search Result 84, Processing Time 0.026 seconds

The Korean Text-to-speech Using Syllable Units (음절 단위를 이용한 한국어 음성 합성)

  • 김병수;윤기선;박성한
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.1
    • /
    • pp.143-150
    • /
    • 1990
  • In this paper, a rule-based method for improving the intelligibility of synthetic speech is proposed. A 12-pole linear prediction coding method is used to model syllable speech signals. A syllable concatenation rule for pause and frame rejection between syllables is developed to improve the naturalness of the synthetic speech. In addition, phonoligical structure transform rule and prosody rule are applied to the synthetic speech by LPC. The illustrative results demonstrate that the synthetic speech obtained by applying these rules has better naturalness than the synthetic speech by LPC.

  • PDF

Synchronizationof Synthetic Facial Image Sequences and Synthetic Speech for Virtual Reality (가상현실을 위한 합성얼굴 동영상과 합성음성의 동기구현)

  • 최장석;이기영
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.7
    • /
    • pp.95-102
    • /
    • 1998
  • This paper proposes a synchronization method of synthetic facial iamge sequences and synthetic speech. The LP-PSOLA synthesizes the speech for each demi-syllable. We provide the 3,040 demi-syllables for unlimited synthesis of the Korean speech. For synthesis of the Facial image sequences, the paper defines the total 11 fundermental patterns for the lip shapes of the Korean consonants and vowels. The fundermental lip shapes allow us to pronounce all Korean sentences. Image synthesis method assigns the fundermental lip shapes to the key frames according to the initial, the middle and the final sound of each syllable in korean input text. The method interpolates the naturally changing lip shapes in inbetween frames. The number of the inbetween frames is estimated from the duration time of each syllable of the synthetic speech. The estimation accomplishes synchronization of the facial image sequences and speech. In speech synthesis, disk memory is required to store 3,040 demi-syllable. In synthesis of the facial image sequences, however, the disk memory is required to store only one image, because all frames are synthesized from the neutral face. Above method realizes synchronization of system which can real the Korean sentences with the synthetic speech and the synthetic facial iage sequences.

  • PDF

Correlation Analysis of PESQ and MOS Evaluation for HMM-based Synthetic Korean Speech (HMM 기반의 한국어 합성음에 대한 PESQ 및 MOS 평가의 상관도 분석)

  • Lin, Cang-Song;Bae, Keun-Sung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.71-75
    • /
    • 2010
  • The PESQ is an objective speech quality evaluation measure that is known to have a high correlation with a subjective speech quality measure such as MOS. To examine whether it could be useful as an objective quality measure of synthetic speech, we carried out both subjective evaluation tests with MOS and DMOS and an objective evaluation test with PESQ for HMM-based Korean synthetic speech signals and analyzed the correlation between them. Experimental results have shown that the PESQ has correlations of 0.87 with MOS and 0.92 with DMOS. It means that the PESQ holds much promise for evaluating the quality of synthetic Korean speech.

  • PDF

Prosody Control of the Synthetic Speech using Sampling Rate Conversion (표본화율 변환을 이용한 합성음의 운율제어)

  • 이현구;홍광석
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.676-679
    • /
    • 1999
  • In this paper, we presents a method to control prosody of the synthetic speech using sampling rate conversion technique. In prosody control, the conventional methods perform overlap and add. So the synthetic speech has a distortion and the voice quality is not satisfied. Using sampling rate conversion technique, we can get high Qualify of the synthetic speech. Also we can control various talking speeds according to speaker's patterns.

  • PDF

Control of Duration Model Parameters in HMM-based Korean Speech Synthesis (HMM 기반의 한국어 음성합성에서 지속시간 모델 파라미터 제어)

  • Kim, Il-Hwan;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.97-105
    • /
    • 2008
  • Nowadays an HMM-based text-to-speech system (HTS) has been very widely studied because it needs less memory and low computation complexity and is suitable for embedded systems in comparison with a corpus-based unit concatenation text-to-speech one. It also has the advantage that voice characteristics and the speaking rate of the synthetic speech can be converted easily by modifying HMM parameters appropriately. We implemented an HMM-based Korean text-to-speech system using a small size Korean speech DB and proposes a method to increase the naturalness of the synthetic speech by controlling duration model parameters in the HMM-based Korean text-to speech system. We performed a paired comparison test to verify that theses techniques are effective. The test result with the preference scores of 73.8% has shown the improvement of the naturalness of the synthetic speech through controlling the duration model parameters.

  • PDF

A New Pruning Method for Synthesis Database Reduction Using Weighted Vector Quantization

  • Kim, Sanghun;Lee, Youngjik;Keikichi Hirose
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.31-38
    • /
    • 2001
  • A large-scale synthesis database for a unit selection based synthesis method usually retains redundant synthesis unit instances, which are useless to the synthetic speech quality. In this paper, to eliminate those instances from the synthesis database, we proposed a new pruning method called weighted vector quantization (WVQ). The WVQ reflects relative importance of each synthesis unit instance when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through the objective and subjective evaluations of the synthetic speech quality: one to simply limit maximum number of instance, and the other based on normal VQ-based clustering. The proposed method showed the best performance under 50% reduction rates. Over 50% of reduction rates, the synthetic speech quality is not seriously but perceptibly degraded. Using the proposed method, the synthesis database can be efficiently reduced without serious degradation of the synthetic speech quality.

  • PDF

A Short-term and Long-term Usability Testing of the Speech Synthesizer for the People with Visual Impairments (시각장애인용 음성합성기에 대한 장/단기 사용성 평가)

  • Lee, H.Y.;Hong, K.H.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.9 no.1
    • /
    • pp.53-60
    • /
    • 2015
  • We conducted a long-term and short-term usability testing on the built-in speech synthesizer of a screen-reader for the people with visual impairments. A total of 20 persons with visual impairments participated in the short-term usability testing, and 10 of them participated in the long-term usability testing. Naturalness and clarity of the synthetic speech were evaluated by MOS scores, preference for various synthetic speeches was examined through a preference test, and the users' satisfaction level and other requirements for the synthetic speech were evaluated by open feedback. We also examined naturalness, clarity, preference, and user requirements for the synthetic speech through a long-term usability testing. Then, we compare and contrast the long-term and short-term usability testing results.

  • PDF

Formant Locus Overlapping Method to Enhance Naturalness of Synthetic Speech (합성음의 자연도 향상을 위한 포먼트 궤적 중첩 방법)

  • 안승권;성굉모
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.28B no.10
    • /
    • pp.755-760
    • /
    • 1991
  • In this paper, we propose a new formant locus overlapping method which can effectively enhance a naturalness of synthetic speech produced by ddemisyllable based Korean text-to-speech system. At first, Korean demisyllables are divided into several number of segments which have linear formant transition characteristics. Then, database, which is composed of start point and length of each formant segments, is provided. When we synthesize speech with these demisyllable database, we concatenate each formant locus by using a proposed overlapping method which can closely simulate haman articulation mechanism. We have implemented a Korean text-to-speech system by using this method and proved that the formant loci of synthetic speech are similar to those of the natural speech. Finally, we could illustrate that the resulting spectrograms of proposed method are more similar to natural speech than those of conventional method.

  • PDF

Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech

  • Kim, Sang-Hun;Lee, Young-Jik;Hirose, Keikichi
    • ETRI Journal
    • /
    • v.23 no.4
    • /
    • pp.168-176
    • /
    • 2001
  • This paper discusses two important issues of corpus-based synthesis: synthesis unit generation based on phrase break strength information and pruning redundant synthesis unit instances. First, the new sentence set for recording was designed to make an efficient synthesis database, reflecting the characteristics of the Korean language. To obtain prosodic context sensitive units, we graded major prosodic phrases into 5 distinctive levels according to pause length and then discriminated intra-word triphones using the levels. Using the synthesis unit with phrase break strength information, synthetic speech was generated and evaluated subjectively. Second, a new pruning method based on weighted vector quantization (WVQ) was proposed to eliminate redundant synthesis unit instances from the synthesis database. WVQ takes the relative importance of each instance into account when clustering similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective evaluations of synthetic speech quality: one to simply limit the maximum number of instances, and the other based on normal VQ-based clustering. For the same reduction rate of instance number, the proposed method showed the best performance. The synthetic speech with reduction rate 45% had almost no perceptible degradation as compared to the synthetic speech without instance reduction.

  • PDF

A Study on the Korean Text-to-Speech Using Demisyllable Units (반음절단위를 이용한 한국어 음성합성에 관한 연구)

  • Yun, Gi-Sun;Park, Sung-Han
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.10
    • /
    • pp.138-145
    • /
    • 1990
  • This paper present a rule-based speech synthesis method for improving the naturalness of synthetic speech and using the small data base based on demisyllable units. A 12-pole Linear Prediction Coding method is used to analyses demisyllable speech signals. A syllable and vowel concatenation rule is developed to improve the naturalness and intelligibility of the synthetic speech. in addiion, phonological structure transform rule using neural net and prosody rules are applied to the synthetic speech.

  • PDF