• 제목/요약/키워드: Corpus-based synthesis

검색결과 34건 처리시간 0.023초

Decision-Tree-Based Markov Model for Phrase Break Prediction

  • Kim, Sang-Hun;Oh, Seung-Shin
    • ETRI Journal
    • /
    • 제29권4호
    • /
    • pp.527-529
    • /
    • 2007
  • In this paper, a decision-tree-based Markov model for phrase break prediction is proposed. The model takes advantage of the non-homogeneous-features-based classification ability of decision tree and temporal break sequence modeling based on the Markov process. For this experiment, a text corpus tagged with parts-of-speech and three break strength levels is prepared and evaluated. The complex feature set, textual conditions, and prior knowledge are utilized; and chunking rules are applied to the search results. The proposed model shows an error reduction rate of about 11.6% compared to the conventional classification model.

  • PDF

코퍼스 기반 한국어 합성기의 억양 구현 방안 (A Method of Intonation Modeling for Corpus-Based Korean Speech Synthesizer)

  • 김진영;박상언;엄기완;최승호
    • 음성과학
    • /
    • 제7권2호
    • /
    • pp.193-208
    • /
    • 2000
  • This paper describes a multi-step method of intonation modeling for corpus-based Korean speech synthesizer. We selected 1833 sentences considering various syntactic structures and built a corresponding speech corpus uttered by a female announcer. We detected the pitch using laryngograph signals and manually marked the prosodic boundaries on recorded speech, and carried out the tagging of part-of-speech and syntactic analysis on the text. The detected pitch was separated into 3 frequency bands of low, mid, high frequency components which correspond to the baseline, the word tone, and the syllable tone. We predicted them using the CART method and the Viterbi search algorithm with a word-tone-dictionary. In the collected spoken sentences, 1500 sentences were trained and 333 sentences were tested. In the layer of word tone modeling, we compared two methods. One is to predict the word tone corresponding to the mid-frequency components directly and the other is to predict it by multiplying the ratio of the word tone to the baseline by the baseline. The former method resulted in a mean error of 12.37 Hz and the latter in one of 12.41 Hz, similar to each other. In the layer of syllable tone modeling, it resulted in a mean error rate less than 8.3% comparing with the mean pitch, 193.56 Hz of the announcer, so its performance was relatively good.

  • PDF

HMM 기반의 한국어 음성합성에서 지속시간 모델 파라미터 제어 (Control of Duration Model Parameters in HMM-based Korean Speech Synthesis)

  • 김일환;배건성
    • 음성과학
    • /
    • 제15권4호
    • /
    • pp.97-105
    • /
    • 2008
  • Nowadays an HMM-based text-to-speech system (HTS) has been very widely studied because it needs less memory and low computation complexity and is suitable for embedded systems in comparison with a corpus-based unit concatenation text-to-speech one. It also has the advantage that voice characteristics and the speaking rate of the synthetic speech can be converted easily by modifying HMM parameters appropriately. We implemented an HMM-based Korean text-to-speech system using a small size Korean speech DB and proposes a method to increase the naturalness of the synthetic speech by controlling duration model parameters in the HMM-based Korean text-to speech system. We performed a paired comparison test to verify that theses techniques are effective. The test result with the preference scores of 73.8% has shown the improvement of the naturalness of the synthetic speech through controlling the duration model parameters.

  • PDF

시간적 분해에 기반한 F0 궤적 모델에 관한 연구 (F0 Contour Model based on Temporal Decomposition)

  • 변효진;김연준;오영환
    • 한국음향학회지
    • /
    • 제18권8호
    • /
    • pp.75-83
    • /
    • 1999
  • 본 논문에서는 음성합성의 억양 제어를 위한 새로운 F0 궤적 모델을 제안한다. 제안한 모델은 발성된 문장의 F0 궤적을 중첩가산되는 사건들로 분해하고, 각 사건들을 가우시안 종모양의 사건함수로 모델링한다. 그리고 제안한 모델을 위한 파라미터 추정 알고리즘을 제시한다. 제안한 모델은 특정한 음운론적 지식에 기반하지 않았으며, F0 궤적의 분석단계와 합성단계에 모두 사용 가능하다. 제안한 모델의 성능평가를 위해 다양한 장르에서 추출한 여러 형태의 500문장의 코퍼스를 구축하고, 이를 전문 아나운서에게 발성하게 하여 구축한 음성코퍼스로 실험한 결과, 원음성의 F0 궤적과 제안한 모델에 의해 합성된 F0 궤적의 평균 제곱 오류근이 7.87Hz이었다.

  • PDF

자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리 (Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation)

  • 박혜영;김형순
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.137-150
    • /
    • 2002
  • Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.

  • PDF

코퍼스 기반 무제한 단어 중국어 TTS (Corpus Based Unrestricted vocabulary Mandarin TTS)

  • ;하주홍;김병창;이근배
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF

현대 서울말 평서문에 나타나는 억양 연구 - 어말어미 "-아/어, -지요" 와 "-ㅂ/습니다" 를 중심으로 - (An Intonation Study of Predicate ending in Current Korean - From final endings of ${\ulcorner}$-a/e, $t{\int}ijo$${\lrcorner}$ and ${\ulcorner}$p/simnida${\lrcorner}$ -)

  • 유기원
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.3-7
    • /
    • 2005
  • This research is for finding prototypes and characteristics of intonation found in ${\ulcorner}$-a/e, $t{\int}ijo$<${\lrcorner}$ and ${\ulcorner}$p/simnida${\lrcorner}$ among modern Korean predicate statements by constructing spoken corpus based on the current radio broadcast. So the result of the study is as follows. : (1) The construction of the balanced spoken corpus and the standard for boundary determination of rhythm are needed for the intonation model of speech synthesis. (2) Korean intonation units have the splited word tone which includes the nuclear tone and the pre-nuclear tone makes unclear tone more detailed. (3) I made man and woman intonation models individually through t-test of SPSS. (4) The standard intonation model is devided '-ajo'type and '-nida'type

  • PDF

코퍼스 기반 음성합성기를 위한 합성단위 경계 스펙트럼 평탄화 알고리즘 (A Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis)

  • 김상진;장경애;한민수
    • 대한음성학회지:말소리
    • /
    • 제56호
    • /
    • pp.225-235
    • /
    • 2005
  • Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the fullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.

  • PDF

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

  • Chung, Hyun-Song;Huckvale, Mark
    • 음성과학
    • /
    • 제8권1호
    • /
    • pp.77-91
    • /
    • 2001
  • This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.

  • PDF

한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구 (A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis)

  • 강선미;권오일
    • 음성과학
    • /
    • 제14권2호
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF