• Title/Summary/Keyword: Speech Synthesis

Search Result 381, Processing Time 0.024 seconds

Common Speech Database Collection for Telecommunications (통신망환경 한국어 공통음성 DB 구축)

  • Kim Sanghun;Park Moonwhan;Kim Hyunsuk
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.23-26
    • /
    • 2003
  • This paper presents common speech database collection for telecommunication applications. During 3 year project, we will construct very large scale speech and text databases for speech recognition, speech synthesis, and speaker identification. The common speech database has been considered various communication environments, distribution of speakers' sex, distribution of speakers' age, and distribution of speakers' region. It consists of Korean continuous digit, isolated words, and sentences which reflects Korean phonetic coverage. In addition, it consists of various pronunciation style such as read speech, dialogue speech, and semi-spontaneous speech. Thanks to the common speech databases, the duplicated resources of Korean speech industries are prohibited. It encourages domestic speech industries and activate speech technology domestic market.

  • PDF

The Phoneme Synthesis of Korean CV Mono-Syllables (한국어 CV단음절의 음소합성)

  • 안점영;김명기
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.2
    • /
    • pp.93-100
    • /
    • 1986
  • We analyzed Korean CV mono-syllables consisted of concatenation of consonants/k, t, p, g/, their fortis and rough sound and vowels/a, e, o, u, I/by the PARCOR technique, and then we synthesized those speech by means of the phoneme synthesis controlling the analyzed data. In the speech analysis, the duration of consonants decreases in the rough sound, the lenis and the fortis in turns. And also the gain of them decreases in the same tendency. The pitch period increases more and more in vowels following the rough sound, the fortis and the lenis in turns. We synthesized the lenis and the fortis by controlling the duration and the gain of the rough sound, and vowels following the fortis and the rough sound by controlling the pitch period and the duration of vowels following the lenis. As the results, the synthesized speech quality is good and we make certain it is possible to make a rule to the phonome synthesis in Korea speech.

  • PDF

Implementation of Text-to-Audio Visual Speech Synthesis Using Key Frames of Face Images (키프레임 얼굴영상을 이용한 시청각음성합성 시스템 구현)

  • Kim MyoungGon;Kim JinYoung;Baek SeongJoon
    • MALSORI
    • /
    • no.43
    • /
    • pp.73-88
    • /
    • 2002
  • In this paper, for natural facial synthesis, lip-synch algorithm based on key-frame method using RBF(radial bases function) is presented. For lips synthesizing, we make viseme range parameters from phoneme and its duration information that come out from the text-to-speech(TTS) system. And we extract viseme information from Av DB that coincides in each phoneme. We apply dominance function to reflect coarticulation phenomenon, and apply bilinear interpolation to reduce calculation time. At the next time lip-synch is performed by playing the synthesized images obtained by interpolation between each phonemes and the speech sound of TTS.

  • PDF

A Study on Multi-Pulse Speech Coding Method by using Individual Pitch Pulses (개별 피치펄스를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

  • 이시우
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.5
    • /
    • pp.977-982
    • /
    • 2004
  • In this paper, I propose a new method of Multi-Pulse Coding(IP-MPC) use individual pitch pulses in order to accommodate the changes in each pitch interval and reduce pitch errors. The extraction rate of individual pitch pulses was 85% for female voice and 96% for male voice respectively. 1 evaluate the MPC by using pitch information of autocorrelation method and the IP-MPC by using individual pitch pulses. As a result, I knew that synthesis speech of the IP-MPC was better in speech quality than synthesis speech of the MPC.

Sensor Control and Aquisition Information Using Voice I/O (음성 입출력을 이용한 센서 제어 및 정보 획득)

  • Youn, Hyung Jin;Lee, Chang Woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.495-496
    • /
    • 2018
  • As more and more companies introduce artificial intelligent(AI) speakers, the price of the speakers has become a burden to someone. Based on some knowledge and dexterity, it is not difficult to make an AI speaker that acquires sensor information and environmental information of the house in accordance with your own taste. In this paper, we implement an AI speaker using Raspberry Pie, Google Cloud Speech (GCS) and Naver's Clova Speech Synthesis (CSS) API.

  • PDF

Development of TTS for a Human-Robot Interface (휴먼-로봇 인터페이스를 위한 TTS의 개발)

  • Bae Jae-Hyun;Oh Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.135-138
    • /
    • 2006
  • The communication method between human and robot is one of the important parts for a human-robot interaction. And speech is easy and intuitive communication method for human-being. By using speech as a communication method for robot, we can use robot as familiar way. In this paper, we developed TTS for human-robot interaction. Synthesis algorithms were modified for an efficient utilization of restricted resource in robot. And synthesis database were reconstructed for an efficiency. As a result, we could reduce the computation time with slight degradation of the speech quality.

  • PDF

A Study on the Synthesis of Korean Speech by Formant VOCODER (포르만트 VOCODER에 의한 한국어 음성합성에 관한 연구)

  • 허강인;이대영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.14 no.6
    • /
    • pp.699-712
    • /
    • 1989
  • This paper describes a method of Korean speech synhes is using format VOCODER. The parameters of speech synthes is are a follows, 1) format F1, F2, and F3 by spectrum moment method and F4, F5 using the length of vocal tract. 2) pitch frequencies obtained by optimu, Comb method using AMDF. 3) short time average energy and short time mean amplitude. 4) The decision method of bandwidth reportd by Fant. 5) voicde/unvoiced discrimination using zerocrossing. 6) excitation wave reported by Rosenberg. 7) gaussian white noise. Synthesis results are in fairly good agreement with original speech.

  • PDF

Enhanced Maximum Voiced Frequency Estimation Scheme for HTS Using Two-Band Excitation Model

  • Park, Jihoon;Hahn, Minsoo
    • ETRI Journal
    • /
    • v.37 no.6
    • /
    • pp.1211-1219
    • /
    • 2015
  • In a hidden Markov model-based speech synthesis system using a two-band excitation model, a maximum voiced frequency (MVF) is the most important feature as an excitation parameter because the synthetic speech quality depends on the MVF. This paper proposes an enhanced MVF estimation scheme based on a peak picking method. In the proposed scheme, both local peaks and peak lobes are picked from the spectrum of a linear predictive residual signal. The average of the normalized distances of local peaks and peak lobes is calculated and utilized as a feature to estimate an MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves the synthetic speech quality compared with that of a conventional one in a mobile device as well as a PC environment.

On a Detection for the Fundamental Frequency of Speech Signals (음성신호의기본주파수 검출)

  • 배명진
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.42-47
    • /
    • 1994
  • A pitch detector is an essential component in a variety of speech processing systems. Besides providing valuable insights into the nature of the exciation source for speech production, the pitch contour of an utterance is useful for recognizing speakers, aids-to-the handicapped, and is required in almost all speech analysis-synthesis system. Because of the importance of the pitch detection, a wide variety algorithms for pitch detection have been proposed in speech procesing literature. Thus, in this paper we discuss th evarious type of pitch detection algorithms which have been proposed until now. Then we provide th eperformance measurements for seven pitch detection algorithms.

  • PDF

On a Reduction of Computation Time of FFT Cepstrum (FFT 켑스트럼의 처리시간 단축에 관한 연구)

  • Jo, Wang-Rae;Kim, Jong-Kuk;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.57-64
    • /
    • 2003
  • The cepstrum coefficients are the most popular feature for speech recognition or speaker recognition. The cepstrum coefficients are also used for speech synthesis and speech coding but has major drawback of long processing time. In this paper, we proposed a new method that can reduce the processing time of FFT cepstrum analysis. We use the normal ordered inputs for FFT function and the bit-reversed inputs for IFFT function. Therefore we can omit the bit-reversing process and reduce the processing time of FFT ceptrum analysis.

  • PDF