• 제목/요약/키워드: Speech Synthesis

검색결과 381건 처리시간 0.025초

음성 파형분절의 지수함수 스므딩 기법에 관한 연구 (The Study on the Expential Smoothing Method of the Concatenation Parts in the Speech Waveform)

  • 박찬수
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1991년도 학술발표회 논문집
    • /
    • pp.7-10
    • /
    • 1991
  • In a text-to-speech system, sound units (phonemes, words, or phrases, etc.) can be concatenated together to produce required utterance. The quality of the resulting speech is dependent on factors including the phonological/prosodic contour, the quality of basic concatenation units, and how well the units join together. Thus although the quality of each basic sound unit is high, if occur the discontinuity in the concatenation part then the quality of synthesis speech is decrease. To solve this problem, a smoothing operation should be carried out in concatenation parts. But a major problem is that, as yet, no method of parameter smoothing is available for joining the segment together. Thus in this paper, we proposed a new aigorithm that smoothing the unnatural discountinuous parts which can be occured in speech waveform editing. This algorithm used the exponential smoothing method.

  • PDF

중소형 선박을 위한 음성합성 기반 자동 안전항해 지원 서비스 제공 시스템 개발 (A Development of Automatic Safety Navigation Support Service Providing System for Medium and Small Ships based on Speech Synthesis)

  • 황훈규;김배성;우윤태
    • 한국정보통신학회논문지
    • /
    • 제25권4호
    • /
    • pp.595-602
    • /
    • 2021
  • 우리나라의 경우, 중소형선박에 의한 해양사고의 발생 비중이 상대적으로 매우 높으며, 통계에 따르면 각종 안전지원 장비의 탑재 의무화에도 불구하고 크게 감소되지 않고 있는 실정이다. 본 논문에서는 대형선박에 비해 상대적으로 탑재 장비가 적은 중소형 선박을 위한 음성합성 기반 자동 안전항해 지원 서비스 제공 시스템의 아키텍처를 제안한다. 시스템의 주목적은 주변 선박들에게 VHF 무전기를 통해 합성된 음성 안전 메시지를 자동으로 제공하여 해양사고를 예방하는 것이다. 안전항해 지원 서비스는 GPS 및 AIS를 연계하여 음성 안전 지원 메시지를 합성하고, VHF를 통하여 자동으로 방송해주는 형태로 동작된다. 따라서 시스템을 구성하는 데이터 처리 모듈, 단계별 위험도 분석 모듈, 음성합성 안전 메시지 생성 모듈, VHF 방송장비 제어 모듈 등을 개발하였다. 또한, 개발한 시스템을 활용하여 실험실 수준의 테스트와 해상 실증 시험을 진행하였으며, 이를 통해 서비스 유용성을 검증하였다.

코퍼스 기반 한국어 합성기의 억양 구현 방안 (A Method of Intonation Modeling for Corpus-Based Korean Speech Synthesizer)

  • 김진영;박상언;엄기완;최승호
    • 음성과학
    • /
    • 제7권2호
    • /
    • pp.193-208
    • /
    • 2000
  • This paper describes a multi-step method of intonation modeling for corpus-based Korean speech synthesizer. We selected 1833 sentences considering various syntactic structures and built a corresponding speech corpus uttered by a female announcer. We detected the pitch using laryngograph signals and manually marked the prosodic boundaries on recorded speech, and carried out the tagging of part-of-speech and syntactic analysis on the text. The detected pitch was separated into 3 frequency bands of low, mid, high frequency components which correspond to the baseline, the word tone, and the syllable tone. We predicted them using the CART method and the Viterbi search algorithm with a word-tone-dictionary. In the collected spoken sentences, 1500 sentences were trained and 333 sentences were tested. In the layer of word tone modeling, we compared two methods. One is to predict the word tone corresponding to the mid-frequency components directly and the other is to predict it by multiplying the ratio of the word tone to the baseline by the baseline. The former method resulted in a mean error of 12.37 Hz and the latter in one of 12.41 Hz, similar to each other. In the layer of syllable tone modeling, it resulted in a mean error rate less than 8.3% comparing with the mean pitch, 193.56 Hz of the announcer, so its performance was relatively good.

  • PDF

음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성 (Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech)

  • 김승원;정옥;이근배;김병창
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.75-92
    • /
    • 2005
  • Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

  • PDF

자연스런 인간-로봇 상호작용을 위한 음성 신호의 AM-FM 성분 분해 및 순간 주파수와 순간 진폭의 추정에 관한 연구 (AM-FM Decomposition and Estimation of Instantaneous Frequency and Instantaneous Amplitude of Speech Signals for Natural Human-robot Interaction)

  • 이희영
    • 음성과학
    • /
    • 제12권4호
    • /
    • pp.53-70
    • /
    • 2005
  • A Vowel of speech signals are multicomponent signals composed of AM-FM components whose instantaneous frequency and instantaneous amplitude are time-varying. The changes of emotion states cause the variation of the instantaneous frequencies and the instantaneous amplitudes of AM-FM components. Therefore, it is important to estimate exactly the instantaneous frequencies and the instantaneous amplitudes of AM-FM components for the extraction of key information representing emotion states and changes in speech signals. In tills paper, firstly a method decomposing speech signals into AM - FM components is addressed. Secondly, the fundamental frequency of vowel sound is estimated by the simple method based on the spectrogram. The estimate of the fundamental frequency is used for decomposing speech signals into AM-FM components. Thirdly, an estimation method is suggested for separation of the instantaneous frequencies and the instantaneous amplitudes of the decomposed AM - FM components, based on Hilbert transform and the demodulation property of the extended Fourier transform. The estimates of the instantaneous frequencies and the instantaneous amplitudes can be used for modification of the spectral distribution and smooth connection of two words in the speech synthesis systems based on a corpus.

  • PDF

ETRI 소용량 대화체 음성합성시스템 (ETRI small-sized dialog style TTS system)

  • 김종진;김정세;김상훈;박준;이윤근;한민수
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.217-220
    • /
    • 2007
  • This study outlines a small-sized dialog style ETRI Korean TTS system which applies a HMM based speech synthesis techniques. In order to build the VoiceFont, dialog-style 500 sentences were used in training HMM. And the context information about phonemes, syllables, words, phrases and sentence were extracted fully automatically to build context-dependent HMM. In training the acoustic model, acoustic features such as Mel-cepstrums, logF0 and its delta, delta-delta were used. The size of the VoiceFont which was built through the training is 0.93Mb. The developed HMM-based TTS system were installed on the ARM720T processor which operates 60MHz clocks/second. To reduce computation time, the MLSA inverse filtering module is implemented with Assembly language. The speed of the fully implemented system is the 1.73 times faster than real time.

  • PDF

합성음성평가를 위한 다음절 무의미단어 생성과 이용에 관한 연구 (A Study on the Generation of Multi-syllable Nonsense Wordset for the Assessment of Synthetic Speech)

  • 조철우;김경태;이용주
    • 한국음향학회지
    • /
    • 제13권5호
    • /
    • pp.51-58
    • /
    • 1994
  • 인간과 기계의 가장 자연스러운 의사소통의 형태인 음성을 통한 인터페이스를 위하여 여러가지 음성합성, 인식기법들이 제안되고 실용화되고 있다. 특히 음성합성의 경우는 실용화가 상당히 이루어지고 있음에도 불구하고 평가기법에 관하여는 아직도 초보적인 단계에 머물고 있다. 본 논문에서는 무의미 단어에 의한 합성음 평가법에 사용할 수 있는 다음절 무의미 단어군 작성법을 제안하고 실제로 구현되어 있는 규칙합성기를 제안된 단어군에 의해 평가한 사례를 소개하고자 한다. 제안된 단어군 작성방식은 음소단위 명료도 및 음소환경에 관한 평가를 행할 경우 유용하게 사용될 수 있다.

  • PDF

A Study on Pitch Period Detection Algorithm Based on Rotation Transform of AMDF and Threshold

  • 서현수;김남호
    • 융합신호처리학회논문지
    • /
    • 제7권4호
    • /
    • pp.178-183
    • /
    • 2006
  • As a lot of researches on the speech signal processing are performed due to the recent rapid development of the information-communication technology. the pitch period is used as an important element to various speech signal application fields such as the speech recognition. speaker identification. speech analysis. or speech synthesis. A variety of algorithms for the time and the frequency domains related with such pitch period detection have been suggested. One of the pitch detection algorithms for the time domain. AMDF (average magnitude difference function) uses distance between two valley points as the calculated pitch period. However, it has a problem that the algorithm becomes complex in selecting the valley points for the pitch period detection. Therefore, in this paper we proposed the modified AMDF(M-AMDF) algorithm which recognizes the entire minimum valley points as the pitch period of the speech signal by using the rotation transform of AMDF. In addition, a threshold is set to the beginning portion of speech so that it can be used as the selection criteria for the pitch period. Moreover the proposed algorithm is compared with the conventional ones by means of the simulation, and presents better properties than others.

  • PDF

Fillers in the Hong Kong Corpus of Spoken English (HKCSE)

  • Seto, Andy
    • 아시아태평양코퍼스연구
    • /
    • 제2권1호
    • /
    • pp.13-22
    • /
    • 2021
  • The present study employed an analytical framework that is characterised by a synthesis of quantitative and qualitative analyses with a specially designed computer software SpeechActConc to examine speech acts in business communication. The naturally occurring data from the audio recordings and the prosodic transcriptions of the business sub-corpora of the HKCSE (prosodic) are manually annotated with a speech act taxonomy for finding out the frequency of fillers, the co-occurring patterns of fillers with other speech acts, and the linguistic realisations of fillers. The discoursal function of fillers to sustain the discourse or to hold the floor has diverse linguistic realisations, ranging from a sound (e.g. 'uhuh') and a word (e.g. 'well') to sounds (e.g. 'um er') and words, namely phrase ('sort of') and clause (e.g. 'you know'). Some are even combinations of sound(s) and word(s) (e.g. 'and um', 'yes er um', 'sort of erm'). Among the top five frequent linguistic realisations of fillers, 'er' and 'um' are the most common ones found in all the six genres with relatively higher percentages of occurrence. The remaining more frequent realisations consist of clause ('you know'), word ('yeah') and sound ('erm'). These common forms are syntactically simpler than the less frequent realisations found in the genres. The co-occurring patterns of fillers and other speech acts are diverse. The more common co-occurring speech acts with fillers include informing and answering. The findings show that fillers are not only frequently used by speakers in spontaneous conversation but also mostly represented in sounds or non-linguistic realisations.

조음 로보틱스 (Articulatory robotics)

  • 남호성
    • 말소리와 음성과학
    • /
    • 제13권2호
    • /
    • pp.1-7
    • /
    • 2021
  • 음성은 개별 조음 기관(입술, 혓끝, 혓몸, 연구개, 성문)에서 일어나는 협착 운동들의 시공간적 협응 구조라 할 수 있다. 다른 인간의 운동(예: 잡기)과 마찬가지로 각각의 협착 운동은 언어학적으로 의미 있는 task이며, 각 task는 그것과 관계된 기본 요소들의 시너지에 의해 수행된다. 본 연구는 이러한 음성 task가 어떻게 기본 요소들인 joint와 동역학적으로 연계될 수 있는지를 로보틱스의 관점에서 논의하고자 한다. 나아가 로보틱스의 기본 원리를 음성과학 분야에 소개함으로써 운동으로서의 음성이 어떻게 발화되는지에 대한 더 깊은 이해를 가능케 하고, 실제 인간의 조음을 모방한 말하는 기계를 구현하는 데 필요한 이론적 토대를 제공하고자 한다.