• Title/Summary/Keyword: prosody prediction

Search Result 11, Processing Time 0.023 seconds

A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis (한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구)

  • Kang, Sun-Mee;Kwon, Oh-Il
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF

Decision-Tree-Based Markov Model for Phrase Break Prediction

  • Kim, Sang-Hun;Oh, Seung-Shin
    • ETRI Journal
    • /
    • v.29 no.4
    • /
    • pp.527-529
    • /
    • 2007
  • In this paper, a decision-tree-based Markov model for phrase break prediction is proposed. The model takes advantage of the non-homogeneous-features-based classification ability of decision tree and temporal break sequence modeling based on the Markov process. For this experiment, a text corpus tagged with parts-of-speech and three break strength levels is prepared and evaluated. The complex feature set, textual conditions, and prior knowledge are utilized; and chunking rules are applied to the search results. The proposed model shows an error reduction rate of about 11.6% compared to the conventional classification model.

  • PDF

Improvements on Phrase Breaks Prediction Using CRF (Conditional Random Fields) (CRF를 이용한 운율경계추성 성능개선)

  • Kim Seung-Won;Lee Geun-Bae;Kim Byeong-Chang
    • MALSORI
    • /
    • no.57
    • /
    • pp.139-152
    • /
    • 2006
  • In this paper, we present a phrase break prediction method using CRF(Conditional Random Fields), which has good performance at classification problems. The phrase break prediction problem was mapped into a classification problem in our research. We trained the CRF using the various linguistic features which was extracted from POS(Part Of Speech) tag, lexicon, length of word, and location of word in the sentences. Combined linguistic features were used in the experiments, and we could collect some linguistic features which generate good performance in the phrase break prediction. From the results of experiments, we can see that the proposed method shows improved performance on previous methods. Additionally, because the linguistic features are independent of each other in our research, the proposed method has higher flexibility than other methods.

  • PDF

The Korean Text-to-speech Using Syllable Units (음절 단위를 이용한 한국어 음성 합성)

  • 김병수;윤기선;박성한
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.1
    • /
    • pp.143-150
    • /
    • 1990
  • In this paper, a rule-based method for improving the intelligibility of synthetic speech is proposed. A 12-pole linear prediction coding method is used to model syllable speech signals. A syllable concatenation rule for pause and frame rejection between syllables is developed to improve the naturalness of the synthetic speech. In addition, phonoligical structure transform rule and prosody rule are applied to the synthetic speech by LPC. The illustrative results demonstrate that the synthetic speech obtained by applying these rules has better naturalness than the synthetic speech by LPC.

  • PDF

Sums-of-Products Models for Korean Segment Duration Prediction

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.7-21
    • /
    • 2003
  • Sums-of-Products models were built for segment duration prediction of spoken Korean. An experiment for the modelling was carried out to apply the results to Korean text-to-speech synthesis systems. 670 read sentences were analyzed. trained and tested for the construction of the duration models. Traditional sequential rule systems were extended to simple additive, multiplicative and additive-multiplicative models based on Sums-of-Products modelling. The parameters used in the modelling include the properties of the target segment and its neighbors and the target segment's position in the prosodic structure. Two optimisation strategies were used: the downhill simplex method and the simulated annealing method. The performance of the models was measured by the correlation coefficient and the root mean squared prediction error (RMSE) between actual and predicted duration in the test data. The best performance was obtained when the data was trained and tested by ' additive-multiplicative models. ' The correlation for the vowel duration prediction was 0.69 and the RMSE. 31.80 ms. while the correlation for the consonant duration prediction was 0.54 and the RMSE. 29.02 ms. The results were not good enough to be applied to the real-time text-to-speech systems. Further investigation of feature interactions is required for the better performance of the Sums-of-Products models.

  • PDF

Prediction of Prosodic Boundaries Using Dependency Relation

  • Kim, Yeon-Jun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4E
    • /
    • pp.26-30
    • /
    • 1999
  • This paper introduces a prosodic phrasing method in Korean to improve the naturalness of speech synthesis, especially in text-to-speech conversion. In prosodic phrasing, it is necessary to understand the structure of a sentence through a language processing procedure, such as part-of-speech (POS) tagging and parsing, since syntactic structure correlates better with the prosodic structure of speech than with other factors. In this paper, the prosodic phrasing procedure is treated from two perspectives: dependency parsing and prosodic phrasing using dependency relations. This is appropriate for Ural-Altaic, since a prosodic boundary in speech usually concurs with a governor of dependency relation. From experimental results, using the proposed method achieved 12% improvement in prosody boundary prediction accuracy with a speech corpus consisting 300 sentences uttered by 3 speakers.

  • PDF

Prosody Boundary Index Prediction Model for Continuous Speech Recognition and Speech Synthesis (연속음성 인식 및 합성을 위한 운율 경계강도 예측 모델)

  • 강평수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.99-102
    • /
    • 1998
  • 본 연구에서는 연속음 인식과 합성을 위한 경계강도 예측 모델을 제안한다. 운율 경계 강도는 음성 합성에서는 운율구 사이의 휴지기의 길이 조절로 합성음의 자연도에 기여를 하고 연속음 인식에서는 인식과정에서 나타나는 후보문장의 선별 과정에 특징변수가 되어 인식률 향상에 큰 역할을 한다. 음성학적으로 발화된 문장은 큰 경계 단위로 볼 때 운율구 형태로 이루어졌다고 볼 수 있으며 구의 경계는 문장의 문법적인 특징과 관련을 지을 수 있게 된다. 본 논문에서는 운율 경계 강도 수준을 4로 하고 문법적인 특징으로는 트리구조 방법으로 결정된 오른쪽 가지의 수식의 깊이(rd)와 link grammar방법으로 결정된 음절수(syl), 연결거리(torig)를 bigram 모형과 결합하여 운율적 경계 강도를 예측한다. 예측 모형으로는 다중 회귀 모형과 Marcov 모형을 제안한다. 이들 모형으로 낭독체 200 문장에 대해 실험한 결과 76%로 경계 강도를 예측할 수 있었다.

  • PDF

A Study on the Korean Text-to-Speech Using Demisyllable Units (반음절단위를 이용한 한국어 음성합성에 관한 연구)

  • Yun, Gi-Sun;Park, Sung-Han
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.10
    • /
    • pp.138-145
    • /
    • 1990
  • This paper present a rule-based speech synthesis method for improving the naturalness of synthetic speech and using the small data base based on demisyllable units. A 12-pole Linear Prediction Coding method is used to analyses demisyllable speech signals. A syllable and vowel concatenation rule is developed to improve the naturalness and intelligibility of the synthetic speech. in addiion, phonological structure transform rule using neural net and prosody rules are applied to the synthetic speech.

  • PDF

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

  • Chung, Hyun-Song;Huckvale, Mark
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.77-91
    • /
    • 2001
  • This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.

  • PDF

Generating Korean Energy Contours Using Vector-regression Tree (벡터 회귀 트리를 이용한 한국어 에너지 궤적 생성)

  • 이상호;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.323-328
    • /
    • 2003
  • This study describes an energy contour generation method for Korean n systems. We propose a vector-regression tree, which is a vector version of a scalar regression tree. A vector-regression tree predicts a response vector for an unknown feature vector. In our study, the tree yields a vector containing ten sampled energy values for each phone. After collecting 500 sentences and its corresponding speech corpus, we trained trees on 300 sentences and tested them on 200 sentences. We construct a bagged tree and a born again one to improve the performance of contour prediction. In the experiment, we got a 0.803 correlation coefficient for the observed and predicted energy values.