• Title/Summary/Keyword: articulatory synthesis

Search Result 6, Processing Time 0.027 seconds

Improved Text-to-Speech Synthesis System Using Articulatory Synthesis and Concatenative Synthesis (조음 합성과 연결 합성 방식을 결합한 개선된 문서-음성 합성 시스템)

  • 이근희;김동주;홍광석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.369-372
    • /
    • 2002
  • In this paper, we present an improved TTS synthesis system using articulatory synthesis and concatenative synthesis. In concatenative synthesis, segments of speech are excised from spoken utterances and connected to form the desired speech signal. We adopt LPC as a parameter, VQ to reduce the memory capacity, and TD-PSOLA to solve the naturalness problem.

  • PDF

Articulatory robotics (조음 로보틱스)

  • Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.1-7
    • /
    • 2021
  • Speech is a spatiotemporally coordinated structure of constriction actions at discrete articulators such as lips, tongue tip, tongue body, velum, and glottis. Like other human movements (e.g., reaching), each action as a linguistic task is completed by a synergy of involved basic elements (e.g., bone, muscle, neural system). This paper discusses how speech tasks are dynamically related to joints as one of the basic elements in terms of robotics of speech production. Further this introduction of robotics to speech sciences will hopefully deepen our understanding of how speech is produced and provide a solid foundation to developing a physical talking machine.

Implementation of nonlinear two-mass vocal folds digital model (성대의 비선형 2-mass 디지털 모델 구현)

  • Lee, Hui-Sung;Chung, Myung-Jin
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.9-11
    • /
    • 2004
  • The vocal folds play an important role to produce glottal pulse which is an essential factor of phonation. There have been some models which implement the vocal folds' dynamics, such as one-mass model, two-mass model, multi-mass model and ribbon model. Among them, this paper uses nonlinear two-mass model, which is simple structure and produces similarly real glottal pulses and vocal folds' vibration, to realize vocal folds digital model. The pattern of movements in vocal folds will be shown in this paper by using vocal folds digital model. It will be verified how initial position of vocal folds. variation of tension and change of lung pressure influences vibration and glottal pulses.

  • PDF

Implementation of Continuous Utterance Using Buffer Rearrangement for Articula Synthesizer (조음 음성 합성기에서 버퍼 재정렬을 이용한 연속음 구현)

  • Lee, Hui-Sung;Chung, Myung-Jin
    • Proceedings of the KIEE Conference
    • /
    • 2002.07d
    • /
    • pp.2454-2456
    • /
    • 2002
  • Since articuratory synthesis models the human vocal organs as precise as possible, it is potentially the most desirable method to produce various words and languages. This paper proposes a new type of an articulatory synthesizer using Mermelstein vocal tract model and Kelly-Lochbaum digital filter. Previous researches have assumed that the length of the vocal tract or the number of its cross sections dose not vary while uttering. However, the continuous utterance can not be easily implemented under this assumption. The limitation is overcomed by "Buffer Rearrangement" for dynamic vocal tract in this paper.

  • PDF

Algorithm for Concatenating Multiple Phonemic Units for Small Size Korean TTS Using RE-PSOLA Method

  • Bak, Il-Suh;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.85-94
    • /
    • 2003
  • In this paper an algorithm to reduce the size of Text-to-Speech database is proposed. The algorithm is based on the characteristics of Korean phonemic units. From the initial database, a reduced phoneme unit set is induced by articulatory similarity of concatenating phonemes. Speech data is read by one female announcer for 1000 phonetically balanced sentences. All the recorded speech is then segmented by phoneticians. Total size of the original speech data is about 640 MB including laryngograph signal. To synthesize wave, RE-PSOLA (Residual-Excited Pitch Synchronous Overlap and Add Method) was used. The voice quality of synthesized speech was compared with original speech in terms of spectrographic informations and objective tests. The quality of the synthesized speech is not much degraded when the size of synthesis DB was reduced from 320 MB to 82 MB.

  • PDF

Speech synthesis using acoustic Doppler signal (초음파 도플러 신호를 이용한 음성 합성)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.134-142
    • /
    • 2016
  • In this paper, a method synthesizing speech signal using the 40 kHz ultrasonic signals reflected from the articulatory muscles was introduced and performance was evaluated. When the ultrasound signals are radiated to articulating face, the Doppler effects caused by movements of lips, jaw, and chin observed. The signals that have different frequencies from that of the transmitted signals are found in the received signals. These ADS (Acoustic-Doppler Signals) were used for estimating of the speech parameters in this study. Prior to synthesizing speech signal, a quantitative correlation analysis between ADS and speech signals was carried out on each frequency bin. According to the results, the feasibility of the ADS-based speech synthesis was validated. ADS-to-speech transformation was achieved by the joint Gaussian mixture model-based conversion rules. The experimental results from the 5 subjects showed that filter bank energy and LPC (Linear Predictive Coefficient) cepstrum coefficients are the optimal features for ADS, and speech, respectively. In the subjective evaluation where synthesized speech signals were obtained using the excitation sources extracted from original speech signals, it was confirmed that the ADS-to-speech conversion method yielded 72.2 % average recognition rates.