• 제목/요약/키워드: Text-to-speech

검색결과 501건 처리시간 0.032초

음성합성시스템을 위한 음색제어규칙 연구 (A Study on Voice Color Control Rules for Speech Synthesis System)

  • 김진영;엄기완
    • 음성과학
    • /
    • 제2권
    • /
    • pp.25-44
    • /
    • 1997
  • When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.

  • PDF

발성장애 평가 시 /a/ 모음연장발성 및 문장검사의 켑스트럼 분석 비교 (Comparison of Vowel and Text-Based Cepstral Analysis in Dysphonia Evaluation)

  • 김태환;최정임;이상혁;진성민
    • 대한후두음성언어의학회지
    • /
    • 제26권2호
    • /
    • pp.117-121
    • /
    • 2015
  • Background : Cepstral analysis which is obtained from Fourier transformation of spectrum has been known to be effective indicator to analyze the voice disorder. To evaluate the voice disorder, phonation of sustained vowel /a/ sound or continuous speech have been used but the former was limited to capture hoarseness properly. This study is aimed to compare the effectiveness in analysis of cepstrum between the sustained vowel /a/ sound and continuous speech. Methods : From March 2012 to December 2014, total 72 patients was enrolled in this study, including 24 unilateral vocal cord palsy, vocal nodule and vocal polyp patients, respectively. The entire patient evaluated their voice quality by VHI (Voice Handicap Index) before and after treatment. Phonation of sustained vowel /a/ sample and continuous speech using the first sentence of autumn paragraph was subjected by cepstral analysis and compare the pre-treatment group and post-treatment group. Results : The measured values of pre and post treatment in CPP-a (cepstral peak prominence in /a/ vowel sound) was 13.80, 13.91 in vocal cord palsy, 16.62, 17.99 in vocal cord nodule, 14.19, 18.50 in vocal cord polyp respectively. Values of CPP-s (cepstral peak prominence in text-based speech) in pre and post treatment was 11.11, 12.09 in vocal cord palsy, 12.11, 14.09 in vocal cord nodule, 12.63, 14.17 in vocal cord polyp. All 72 patients showed subjective improvement in VHI after treatment. CPP-a showed statistical improvement only in vocal polyp group, but CPP-s showed statistical improvement in all three groups (p<0.05). Conclusion : In analysis of cepstrum, text-based analysis is more representative in voice disorder than vowel sound speech. So when the acoustic analysis of voice by cepstrum, both phonation of sustained vowel /a/ sound and text based speech should be performed to obtain more accurate result.

  • PDF

Analysis and Interpretation of Intonation Contours of Slovene

  • Ales Dobnikar
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.542-547
    • /
    • 1996
  • Prosodic characteristics of natural speech, especially intonation, in many cases represent specific feelings of the speaker at the time of the utterance, with relatively vast variations of speaking styles over the same text. We analyzed a collected speech corpus, recorded with ten Slovene speakers. Interpretation of observed intonation contours was done for the purpose of modelling the intonation contour in synthesis process. We devised a scheme for modeling the intonation contour for different types of intonation units based on the results of analyzing intonation contours. The intonation scheme uses a superpositional approach, which defines the intonation contour as the sum of global (intonation unit) and local (accented syllables or syntactic boundaries) components. Near-to-natural intonation contour was obtained by rules, using only the text of the utterance as input.

  • PDF

한국어 음성합성기의 성능 향상을 위한 합성 단위의 유무성음 분리 (Separation of Voiced Sounds and Unvoiced Sounds for Corpus-based Korean Text-To-Speech)

  • 홍문기;신지영;강선미
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.7-25
    • /
    • 2003
  • Predicting the right prosodic elements is a key factor in improving the quality of synthesized speech. Prosodic elements include break, pitch, duration and loudness. Pitch, which is realized by Fundamental Frequency (F0), is the most important element relating to the quality of the synthesized speech. However, the previous method for predicting the F0 appears to reveal some problems. If voiced and unvoiced sounds are not correctly classified, it results in wrong prediction of pitch, wrong unit of triphone in synthesizing the voiced and unvoiced sounds, and the sound of click or vibration. This kind of feature is usual in the case of the transformation from the voiced sound to the unvoiced sound or from the unvoiced sound to the voiced sound. Such problem is not resolved by the method of grammar, and it much influences the synthesized sound. Therefore, to steadily acquire the correct value of pitch, in this paper we propose a new model for predicting and classifying the voiced and unvoiced sounds using the CART tool.

  • PDF

포만트 분석/합성 시스템 구현 (Implementation of Formant Speech Analysis/Synthesis System)

  • 이준우;손일권;배건성
    • 음성과학
    • /
    • 제1권
    • /
    • pp.295-314
    • /
    • 1997
  • In this study, we will implement a flexible formant analysis and synthesis system. In the analysis part, the two-channel (i.e., speech & EGG signals) approach is investigated for accurate estimation of formant information. The EGG signal is used for extracting exact pitch information that is needed for the pitch synchronous LPC analysis and closed phase LPC analysis. In the synthesis part, Klatt formant synthesizer is modified so that the user can change synthesis parameters arbitarily. Experimental results demonstrate the superiority of the two-channel analysis method over the one-channel(speech signal only) method in analysis as well as in synthesis. The implemented system is expected to be very helpful for studing the effects of synthesis parameters on the quality of synthetic speech and for the development of Korean text-to-speech(TTS) system with the formant synthesis method.

  • PDF

데이터베이스 분산을 통한 소용량 문자-음성 합성 단말기 구현 (Implementation of text to speech terminal system by distributed database)

  • 김영길;박창현;양윤기
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2431-2434
    • /
    • 2003
  • In this research, our goal is to realize Korean Distribute TTS system with server/client function in wireless network. The speech databases and some routines of TTS system is stuck with the server which has strong functions and we made Korean speech databases and accomplished research about DB which is suitable for distributed TTS. We designed a terminal has the minimum setting which operate this TTS and designed proper protocol so we will check action of Distributed TTS.

  • PDF

A New Pruning Method for Synthesis Database Reduction Using Weighted Vector Quantization

  • Kim, Sanghun;Lee, Youngjik;Keikichi Hirose
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권4E호
    • /
    • pp.31-38
    • /
    • 2001
  • A large-scale synthesis database for a unit selection based synthesis method usually retains redundant synthesis unit instances, which are useless to the synthetic speech quality. In this paper, to eliminate those instances from the synthesis database, we proposed a new pruning method called weighted vector quantization (WVQ). The WVQ reflects relative importance of each synthesis unit instance when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through the objective and subjective evaluations of the synthetic speech quality: one to simply limit maximum number of instance, and the other based on normal VQ-based clustering. The proposed method showed the best performance under 50% reduction rates. Over 50% of reduction rates, the synthetic speech quality is not seriously but perceptibly degraded. Using the proposed method, the synthesis database can be efficiently reduced without serious degradation of the synthetic speech quality.

  • PDF

반음절단위를 이용한 한국어 음성합성에 관한 연구 (A Study on the Korean Text-to-Speech Using Demisyllable Units)

  • 윤기선;박성한
    • 대한전자공학회논문지
    • /
    • 제27권10호
    • /
    • pp.138-145
    • /
    • 1990
  • 본 논문에서는 합성단위를 반음절로 하여 적은 데이터 베이스를 차지하면서도, 합성음의 자연스러움을 향상 시키기 위한 한국어 규칙 합성법을 제시한다. 반음절 음성신호를 분석하기 위해 12차 선형 예측법을 사용하며, 합성음의 자연성과 명료성을 위해 음절간 접속 규칙, 모음부의 연결규칙을 개발한다. 또한 신경망 모델을 이용한 음운 변동 규칙과 운율규칙을 적용한다.

  • PDF

Prediction of Prosodic Boundaries Using Dependency Relation

  • Kim, Yeon-Jun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • 제18권4E호
    • /
    • pp.26-30
    • /
    • 1999
  • This paper introduces a prosodic phrasing method in Korean to improve the naturalness of speech synthesis, especially in text-to-speech conversion. In prosodic phrasing, it is necessary to understand the structure of a sentence through a language processing procedure, such as part-of-speech (POS) tagging and parsing, since syntactic structure correlates better with the prosodic structure of speech than with other factors. In this paper, the prosodic phrasing procedure is treated from two perspectives: dependency parsing and prosodic phrasing using dependency relations. This is appropriate for Ural-Altaic, since a prosodic boundary in speech usually concurs with a governor of dependency relation. From experimental results, using the proposed method achieved 12% improvement in prosody boundary prediction accuracy with a speech corpus consisting 300 sentences uttered by 3 speakers.

  • PDF

문서-음성 변환 임베디드 시스템 구축에 관한 연구 (A study of Implementing An Embedded System for Conversion from Text to Speech)

  • 이현창;서정만
    • 한국컴퓨터정보학회논문지
    • /
    • 제13권3호
    • /
    • pp.77-83
    • /
    • 2008
  • 최신 IT 정보기술 발전에 힘입어 관련 소프트웨어 개발 및 하드웨어 보급이 보편화될수록 IT 기술 활용에 제한적인 사람들은 IT 정보기술 격차를 더욱 크게 느낄 수 있다. 정보통신기기는 일반사용자 이외에도 신체적으로 장애를 가지고 있는 사람들에게 의사소통 및 정보획득을 위한 중요한 수단이 되었다. 또한, 우리나라는 빠르게 고령화사회로 접어들고 있으면서도 장애를 가지는 이들을 위한 대응 제품을 적기에 제시하지 못하고 있다. 특히, 나이가 들어감에 따라 신체적으로 기능이 저하되는 것 가운데 가장 먼저 시각 기능 저하를 들 수 있다. 시각기능 저하로 인해 장애를 겪고 있는 사람들을 위한 정보전달 수단으로 점자책 등이 존재한다. 그러나 일반 서적에 대한 활용과 비교하면 이용 및 활용을 위한 제반 기술이 상당히 부족한 현실이다. 따라서 본 연구에서는 시각기능 저하 혹은 시각장애를 가지는 사람들에게도 일반 서적의 내용을 이해할 수 있도록 리눅스 기반에 소형 스캐너를 부착한 임베디드 시스템 구축에 관한 내용으로서, 문서에 대해 휴대용 스캐너를 활용하여 문자 추출 및 음성으로 변환해주는 통합 시스템에 관하여 살펴본다.

  • PDF