• Title/Summary/Keyword: Speech Synthesis

Search Result 381, Processing Time 0.023 seconds

Speaker-Adaptive Speech Synthesis based on Fuzzy Vector Quantizer Mapping and Neural Networks (퍼지 벡터 양자화기 사상화와 신경망에 의한 화자적응 음성합성)

  • Lee, Jin-Yi;Lee, Gwang-Hyeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.149-160
    • /
    • 1997
  • This paper is concerned with the problem of speaker-adaptive speech synthes is method using a mapped codebook designed by fuzzy mapping on FLVQ (Fuzzy Learning Vector Quantization). The FLVQ is used to design both input and reference speaker's codebook. This algorithm is incorporated fuzzy membership function into the LVQ(learning vector quantization) networks. Unlike the LVQ algorithm, this algorithm minimizes the network output errors which are the differences of clas s membership target and actual membership values, and results to minimize the distances between training patterns and competing neurons. Speaker Adaptation in speech synthesis is performed as follow;input speaker's codebook is mapped a reference speaker's codebook in fuzzy concepts. The Fuzzy VQ mapping replaces a codevector preserving its fuzzy membership function. The codevector correspondence histogram is obtained by accumulating the vector correspondence along the DTW optimal path. We use the Fuzzy VQ mapping to design a mapped codebook. The mapped codebook is defined as a linear combination of reference speaker's vectors using each fuzzy histogram as a weighting function with membership values. In adaptive-speech synthesis stage, input speech is fuzzy vector-quantized by the mapped codcbook, and then FCM arithmetic is used to synthesize speech adapted to input speaker. The speaker adaption experiments are carried out using speech of males in their thirties as input speaker's speech, and a female in her twenties as reference speaker's speech. Speeches used in experiments are sentences /anyoung hasim nika/ and /good morning/. As a results of experiments, we obtained a synthesized speech adapted to input speaker.

  • PDF

Factored MLLR Adaptation for HMM-Based Speech Synthesis in Naval-IT Fusion Technology (인자화된 최대 공산선형회귀 적응기법을 적용한 해양IT융합기술을 위한 HMM기반 음성합성 시스템)

  • Sung, June Sig;Hong, Doo Hwa;Jeong, Min A;Lee, Yeonwoo;Lee, Seong Ro;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.2
    • /
    • pp.213-218
    • /
    • 2013
  • One of the most popular approaches to parameter adaptation in hidden Markov model (HMM) based systems is the maximum likelihood linear regression (MLLR) technique. In our previous study, we proposed factored MLLR (FMLLR) where each MLLR parameter is defined as a function of a control vector. We presented a method to train the FMLLR parameters based on a general framework of the expectation-maximization (EM) algorithm. Using the proposed algorithm, supplementary information which cannot be included in the models is effectively reflected in the adaptation process. In this paper, we apply the FMLLR algorithm to a pitch sequence as well as spectrum parameters. In a series of experiments on artificial generation of expressive speech, we evaluate the performance of the FMLLR technique and also compare with other approaches to parameter adaptation in HMM-based speech synthesis.

EVALUATION OF THE SYNTHETIC SPEECH QUALITY BY THE TD-PCULI METHOD

  • Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok;Kwon, Ki-Hyung;Chin, Yong-Ohk
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.977-983
    • /
    • 1994
  • In this paper we have evaluated the synthetic speech quality by the proposed TD-PCULI speech synthesis method. For the synthesis we have extracted parameters from the Korean monosyllables through the analysis of speech waveforms in the time domain. We have constructed the Korean data format dictionary for the synthesis-by-rule depending upon the frequencies of the Korean pronunciation large vocabulary dictionary, in which V type syllables are 19, CV type's are 80, VC type's are 30 and CVC type's are 100. And using them we have synthesized various Korean monosyllables, words and sentences. We have tested each 10 syllables selected according to the 4 Korean syllable types with the objective MOS(Mean Opinion Score) evluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. And also we have tested the possibility to modify a duration and F0 into another forms with changing a duration (i.e., 150msec, 300msec, 500msec, 700msec and 1sec) and a central fundamental frequency(i.e., 80Hz, 118Hz, 140Hz, 170Hz, and 200Hz). As the results of experiments the noises occurred in the course of synthesizing the speech by the rules are removed to be a very clear level and we can find that the prosodic elements can be controled as a good condition.

  • PDF

Pruning Methodology for Reducing the Size of Speech DB for Corpus-based TTS Systems (코퍼스 기반 음성합성기의 데이터베이스 축소 방법)

  • 최승호;엄기완;강상기;김진영
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.703-710
    • /
    • 2003
  • Because of their human-like synthesized speech quality, recently Corpus-Based Text-To-Speech(CB-TTS) have been actively studied worldwide. However, due to their large size speech database (DB), their application is very restricted. In this paper we propose and evaluate three DB reduction algorithms to which are designed to solve the above drawback. The first method is based on a K-means clustering approach, which selects k-representatives among multiple instances. The second method is keeping only those unit instances that are selected during synthesis, using a domain-restricted text as input to the synthesizer. The third method is a kind of hybrid approach of the above two methods and is using a large text as input in the system. After synthesizing the given sentences, the used unit instances and their occurrence information is extracted. As next step a modified K-means clustering is applied, which takes into account also the occurrence information of the selected unit instances, Finally we compare three pruning methods by evaluating the synthesized speech quality for the similar DB reduction rate, Based on perceptual listening tests, we concluded that the last method shows the best performance among three algorithms. More than this, the results show that the last method is able to reduce DB size without speech quality looses.

Performance comparison of various deep neural network architectures using Merlin toolkit for a Korean TTS system (Merlin 툴킷을 이용한 한국어 TTS 시스템의 심층 신경망 구조 성능 비교)

  • Hong, Junyoung;Kwon, Chulhong
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.57-64
    • /
    • 2019
  • In this paper, we construct a Korean text-to-speech system using the Merlin toolkit which is an open source system for speech synthesis. In the text-to-speech system, the HMM-based statistical parametric speech synthesis method is widely used, but it is known that the quality of synthesized speech is degraded due to limitations of the acoustic modeling scheme that includes context factors. In this paper, we propose an acoustic modeling architecture that uses deep neural network technique, which shows excellent performance in various fields. Fully connected deep feedforward neural network (DNN), recurrent neural network (RNN), gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional LSTM (BLSTM) are included in the architecture. Experimental results have shown that the performance is improved by including sequence modeling in the architecture, and the architecture with LSTM or BLSTM shows the best performance. It has been also found that inclusion of delta and delta-delta components in the acoustic feature parameters is advantageous for performance improvement.

Effect of Glottal Wave Shape on the Vowel Phoneme Synthesis (성문파형이 모음음소합성에 미치는 영향)

  • 안점영;김명기
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.10 no.4
    • /
    • pp.159-167
    • /
    • 1985
  • It was demonstrated that the glottal waves are different depending on a kind of vowels in deriving the glottal waves directly from Korean vowels/a, e, I, o, u/ w, ch are recorded by a male speaker. After resynthesizing vowels with five simulated glottal waves, the effects of glottal wave shape on the speech synthesis were compared with in terms of waveform. Some changes could be seen in the waveforms of the synthetic vowels with the variation of the shape, opening time and closing time, therefore it was confirmed that in the speech sysnthesis, the glottal wave shape is an important factor in the improvement of the speech quality.

  • PDF

Analysis and synthesis of Korean Vowels by LP Method (LP 방법에 의한 한국모음의 분석과 합성)

  • Son, Ho-In;Sin, Dong-Jin;An, Su-Gil
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.18 no.1
    • /
    • pp.41-50
    • /
    • 1981
  • The human speech contains many redundancies. To economize communication channel or memory size for a computerized synthesis of human voices, it is necessary to compress the data before sending. We have treated human speech organ as an eighth order dynamic system which is time varying as the person speaks. Using an anaylyzer of our design, each eight parameters are obtained for the vowels [아], [어], [오], [우], [으], [이], [애], and (외) of korean language with considerable discrepancies between persons. Supplying those parameters to a synthesizer which we have made, we have sucoeeded in the simulation of human speech for the above mentioned vowels of Korean language and observed that they bear all the features of the original speakers.

  • PDF

Perceptual Experiment on Number Production for Speaker Identification

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.7-19
    • /
    • 2001
  • The acoustic parameters of nine Korean numbers were analyzed by Praat, a speech analysis software, and synthesized by SenSynPPC, a Klatt formant synthesizer. The overall intensity, pitch and formant values of the numbers were modified dynamically by a step of 1 dB, 1 Hz and 2.5% respectively. The study explored the sensitivity of listeners to changes in the three acoustic parameters. Twelve subjects (male and female) listened to 390 pairs of synthesized numbers and judged whether the given pair sounded the same or different. Results showed that subjects perceived the same sound quality within the range of 6.6 dB of intensity variation, 10.5 Hz of pitch variation and 5.9% of the first three formant variations. The male and female groups showed almost the same perceptual ranges. Also, an asymmetrical structure of high and low boundary was observed. The ranges may be applicable to the development of a speaker identification system while the method of synthesis modification may apply to its evaluation data.

  • PDF

A Computation Study of Prosodic Structures of Korean for Speech Recognition and Synthesis:Predicting Phonological Boundaries (음성인식.합성을 위한 한국어 운율단위 음운론의 계산적 연구:음운단위에 따른 경계의 발견)

  • Lee, Chan-Do
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.280-287
    • /
    • 1997
  • The introduction of phonological knowledge, prosodic information to speech recognition and synthesis systems is very important to build successful spoken language systems. First, related works of computational phonology is overviewed and the theoretical and experimental studies of prosodic structures and boundaries in Korean are summarized. The main focus of this study is to decide which prosodic phrasing trained on a simple recurrent network. The results show information other than phonetic features. This method can be combined with other useful information to predict the boundaries more correctly and to help segmentation, which are vital for the successful speech recognition and synthesis systems.

  • PDF

A Study of Korean Phonetic and Phonological Properties for Speech Recognition and Synthesis (음성 인식/합성을 위한 국어의 음성-음운론적 특성 연구)

  • Chung, Kook;Koo, Hee-San;Lee, Chan-Do;Kim, Jong-Mi;Han , Sun-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.31-44
    • /
    • 1994
  • The paper introduces several studies of various aspects of Korean phonology and phonetics for speech recognition and synthesis. The phonological and phonetic studies presented in this paper are : i) For a study of segmental phonology, we made an annotated list of Korean allophones and their corresponding alphabetic symbols to type into computers. ii) For a study of segmental phonetics, we present some acoustic regulations in Korean consonants according to their phonological environment within a word. iii) For a study of prosodic phonology, we suggest the phonological functions of prosodic features and their acoustic cues. iv) For a study of prosodic phonetics, we present the characteristic patterns of accent and intonation in Korean. v) Finally, we suggest some ways of using this phonological and phonetic knowledge for possible improvement of speech recognition and synthesis.

  • PDF