• 제목/요약/키워드: Speech Synthesizer

검색결과 52건 처리시간 0.023초

범용 DSP를 이용한 LPC 방식 실시간 음성 합성기 설계에 관한 연구 (A Study on the Design of the real-time speech synthesizer with the LPC method using Digital Signal Processor.)

  • 김홍선
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1984년도 추계학술발표회 논문집
    • /
    • pp.63-65
    • /
    • 1984
  • In this paper, the implementation of the real time LPC synthesizer using NEC 77p20, the DSP (Digital Signal Processor) chip which facilitates and simplifies the digital hardware, is considered. This method shows the good quality with the low bit rate below 9.6kbps and has the advantage of the flexibility and the simplicity.

  • PDF

Glottal Parameters Contributing to the Perception of Loud Voices

  • Yi, So-Pae;Lee, One-Good;Kim, Hyung-Soon
    • 음성과학
    • /
    • 제8권1호
    • /
    • pp.143-157
    • /
    • 2001
  • This paper focused on glottal parameters contributing to the perception of loud voices because energy of a voice is not the only effective factor. We used a formant synthesizer to synthesize loud voices. We divided F0 tilt (the tilt of F0 contour), SQ (Speed Quotient), OQ (Open Quotient) and TL (spectral Tilt Level) into three levels to get different combinations with default values for the other synthesizer parameters. Analysis of listening tests indicated that F0 tilt, SQ, OQ and TL in descending order had significant influence on the perception of loud voices. F0 tilt had a far more significant effect than the others. The influence of SQ increased greatly with the exclusion of F0 tilt as a factor. The interaction between parameters was not significant.

  • PDF

Durational Correlates of Prosodic Categories: The Case of Two Korean Voiceless Coronal Fricatives

  • Yoon, Kyu-Chul
    • 음성과학
    • /
    • 제12권1호
    • /
    • pp.89-105
    • /
    • 2005
  • This paper is a production study of the effects of Korean prosody on two voiceless coronal fricatives /$s^h$/ and /s*/. The target segments were embedded in four prosodic positions: initial to the Intonational Phrase or the Accentual Phrase, and medial to the Accentual Phrase or to the Prosodic Word. Acoustic measurements showed that the durational differences associated with the /$s^h$/ versus /s*/ contrast vary in magnitude in different prosodic positions, confirming the proposal that segmental properties are affected by prosodic categories. This suggests that any speech synthesizer should take into consideration prosodically conditioned durational variation.

  • PDF

Implementation and Evaluation of an HMM-Based Speech Synthesis System for the Tagalog Language

  • ;김경태;김종진
    • 대한음성학회지:말소리
    • /
    • 제68권
    • /
    • pp.49-63
    • /
    • 2008
  • This paper describes the development and assessment of a hidden Markov model (HMM) based Tagalog speech synthesis system, where Tagalog is the most widely spoken indigenous language of the Philippines. Several aspects of the design process are discussed here. In order to build the synthesizer a speech database is recorded and phonetically segmented. The constructed speech corpus contains approximately 89 minutes of Tagalog speech organized in 596 spoken utterances. Furthermore, contextual information is determined. The quality of the synthesized speech is assessed by subjective tests employing 25 native Tagalog speakers as respondents. Experimental results show that the new system is able to obtain a 3.29 MOS which indicates that the developed system is able to produce highly intelligible neutral Tagalog speech with stable quality even when a small amount of speech data is used for HMM training.

  • PDF

한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구 (A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis)

  • 강선미;권오일
    • 음성과학
    • /
    • 제14권2호
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF

포만트 분석/합성 시스템 구현 (Implementation of Formant Speech Analysis/Synthesis System)

  • 이준우;손일권;배건성
    • 음성과학
    • /
    • 제1권
    • /
    • pp.295-314
    • /
    • 1997
  • In this study, we will implement a flexible formant analysis and synthesis system. In the analysis part, the two-channel (i.e., speech & EGG signals) approach is investigated for accurate estimation of formant information. The EGG signal is used for extracting exact pitch information that is needed for the pitch synchronous LPC analysis and closed phase LPC analysis. In the synthesis part, Klatt formant synthesizer is modified so that the user can change synthesis parameters arbitarily. Experimental results demonstrate the superiority of the two-channel analysis method over the one-channel(speech signal only) method in analysis as well as in synthesis. The implemented system is expected to be very helpful for studing the effects of synthesis parameters on the quality of synthetic speech and for the development of Korean text-to-speech(TTS) system with the formant synthesis method.

  • PDF

무제한 음성합성기를 위한음성 분석 장치 (Speech Analysis Tools for Text-to-Speech Synthesizer)

  • 김재인
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1995년도 제12회 음성통신 및 신호처리 워크샵 논문집 (SCAS 12권 1호)
    • /
    • pp.115-118
    • /
    • 1995
  • 무제한 음성합성기를 구현하기 위하여 꼭 필요한 음성분석장치의 개발에 대하여 논하엿다. 이 분석장치는 신호처리 보드를 사용하여 PC에서 사용할 수 있도록 되어 있으며, 음성의 A/D, D/A 및 spectrogram display는 물론 pitch pulse 위치를 Glottal instint closure에 맞추어 삽입할 수 있어 linear prediction base의 무제한 합성기에서 필요한 음성 data base를 구축하기 용이하도록 개발하였다. 또한 음성인식을 위한 음성 DB나 현재 사용중인 ARS를 구축하고자 할 때에도 적은 노력과 시간이 소요되도록 하였다.

  • PDF

IMPLEMENTATION OF REAL TIME RELP VOCODER ON THE TMS320C25 DSP CHIP

  • Kwon, Kee-Hyeon;Chong, Jong-Wha
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
    • /
    • pp.957-962
    • /
    • 1994
  • Real-time RELP vocoder is implemented on the TMS320C25 DSP chip. The implemented system is IBM-PC add-on board and composed of analog in/out unit, DSP unit, memoy unit, IBM-PC interface unit and its supporting assembly software. Speech analyzer and synthesizer is implimented by DSP assembly software. Speech parameters such as LPC coefficients, base-band residuals, and signal gains is extracted by autocorrelation method and inverse filter and synthesized by spectral folding method and direct form synthesis filter in this board. And then, real-time RELP vocoder with 9.6Kbps is simulated by down-loading method in the DSP program RAM.

  • PDF

Speech Interactive Agent on Car Navigation System Using Embedded ASR/DSR/TTS

  • Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
    • 음성과학
    • /
    • 제11권2호
    • /
    • pp.181-192
    • /
    • 2004
  • This paper presents an efficient speech interactive agent rendering smooth car navigation and Telematics services, by employing embedded automatic speech recognition (ASR), distributed speech recognition (DSR) and text-to-speech (ITS) modules, all while enabling safe driving. A speech interactive agent is essentially a conversational tool providing command and control functions to drivers such' as enabling navigation task, audio/video manipulation, and E-commerce services through natural voice/response interactions between user and interface. While the benefits of automatic speech recognition and speech synthesizer have become well known, involved hardware resources are often limited and internal communication protocols are complex to achieve real time responses. As a result, performance degradation always exists in the embedded H/W system. To implement the speech interactive agent to accommodate the demands of user commands in real time, we propose to optimize the hardware dependent architectural codes for speed-up. In particular, we propose to provide a composite solution through memory reconfiguration and efficient arithmetic operation conversion, as well as invoking an effective out-of-vocabulary rejection algorithm, all made suitable for system operation under limited resources.

  • PDF

코퍼스기반 음성합성기의 데이터베이스 최적화 방안 (An Optimization of Speech Database in Corpus-based speech synthesis sytstem)

  • 장경애;정민화
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.209-213
    • /
    • 2002
  • This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB should reflect the frequency of units in Korean language. So, the target population of every unit is set to be proportional to their frequency in Korean large corpus(780K sentences, 45Mega phonemes). Second, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance should be reflected in clustering criterion and used as criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than using conventional methods.

  • PDF