• Title/Summary/Keyword: Speech style

Search Result 84, Processing Time 0.019 seconds

ETRI small-sized dialog style TTS system (ETRI 소용량 대화체 음성합성시스템)

  • Kim, Jong-Jin;Kim, Jeong-Se;Kim, Sang-Hun;Park, Jun;Lee, Yun-Keun;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.217-220
    • /
    • 2007
  • This study outlines a small-sized dialog style ETRI Korean TTS system which applies a HMM based speech synthesis techniques. In order to build the VoiceFont, dialog-style 500 sentences were used in training HMM. And the context information about phonemes, syllables, words, phrases and sentence were extracted fully automatically to build context-dependent HMM. In training the acoustic model, acoustic features such as Mel-cepstrums, logF0 and its delta, delta-delta were used. The size of the VoiceFont which was built through the training is 0.93Mb. The developed HMM-based TTS system were installed on the ARM720T processor which operates 60MHz clocks/second. To reduce computation time, the MLSA inverse filtering module is implemented with Assembly language. The speed of the fully implemented system is the 1.73 times faster than real time.

  • PDF

Grapheme-to-Phoneme Conversion and Prosody Modeling for Korean Conversational Style TTS (한국어 대화체 TTS 개발을 위한 발음 및 운율 추정)

  • Lee, Jin-Sik;Kim, Seung-Won;Kim, Byeong-Chang;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.135-138
    • /
    • 2006
  • In this paper, we introduce a method for extracting grapheme-to-phoneme conversion rules from the transcription of speech synthesis database and a prosody modeling method using the light version of ToBI for a Korean conversational style TTS. We focused on representing the characteristics of the conversational speech style and the experimental results show that our proposed methods are suitable for developing a Korean conversional style TTS.

  • PDF

Performance Comparison of Multiple-Model Speech Recognizer with Multi-Style Training Method Under Noisy Environments (잡음 환경하에서의 다 모델 기반인식기와 다 스타일 학습방법과의 성능비교)

  • Yoon, Jang-Hyuk;Chung, Young-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.2E
    • /
    • pp.100-106
    • /
    • 2010
  • Multiple-model speech recognizer has been shown to be quite successful in noisy speech recognition. However, its performance has usually been tested using the general speech front-ends which do not incorporate any noise adaptive algorithms. For the accurate evaluation of the effectiveness of the multiple-model frame in noisy speech recognition, we used the state-of-the-art front-ends and compared its performance with the well-known multi-style training method. In addition, we improved the multiple-model speech recognizer by employing N-best reference HMMs for interpolation and using multiple SNR levels for training each of the reference HMM.

The English Cause-Focused Causal Construction

  • Kim, Yangsoon
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.161-166
    • /
    • 2020
  • The primary aim of this paper is to analyze the resultative adjunct clause, i.e., (thus/thereby/hence) ~ing participle and provide explicit syntactic, semantic and sociolinguistic explanation on the question what causes the cause-focused causal construction with resultative (thus/thereby/hence) ~ing participle in English. What comes first is either cause or effect clause. This study explores the recent style shift of causal constructions from the effect-focused pattern to the cause-focused pattern. In this study, we argue that the increasing number of the cause-focused main clause with a resultative ~ing participle clause shows the process of the style evolution improving speech/wring style in many respects including syntactic simplification, clarification of the sentence meaning with impact on the focused clauses, and improvement of the flow of speech/writing. The style shift found in the English resultative adjunct clauses, i.e., (thus/hence/thereby) ~ing participle constructions prove to be the style evolution from syntactic, semantic and sociolinguistic point of views.

Analysis of Singing Technique of Mongolian Traditional Singing Called Khoomei (몽골 전통 발성 흐미의 발성 방법 분석에 대한 사례연구)

  • Nam, Do-Hyun;Paik, Jae-Yeon;Hwang, Yoen-Shin;Choi, Hong-Shik
    • Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.145-156
    • /
    • 2008
  • The goal of this study was to investigate acoustic and physiologic characteristics of two phonation types of 'Khoomei' which is a traditional singing style of people who live around the Altai mountains or Mongolia region. It can be produced two pitches simultaneously - high melody pitch can be perceived along with a low drone pitch. Sygyt and kargyraa styles are the most popular and identifiable styles and they can be recognized as the different sounds depending on the method of voice production. Two trained Mongolians participated and have used at least 5 - 6 years. The characteristics of this voice production were measured by using flexible fiberscope, Stroboscopy, Lx Speech studio, Spead, and Doctor Speech. In Sygyt style, very high vocal fold closure (71.50%) with both true and false vocal folds contact and strong breathing support was observed. They also showed that tongue height and harmonics were increased (around 10dB) with resonance cavity movement. In contrast, it was found that Kargyraa sound had very low pitch with relaxed stomach, less laryngeal tension and lower vocal fold contact (69.50%) than hard Sygyt style sound without raising the tongue during phonation. 'Khoomei' phonation can be made by strong contact of both true and false vocal folds and by increasing the harmonics as well.

  • PDF

Intelligibility Improvement Benefit of Clear Speech and Korean Stops

  • Kang, Kyoung-Ho
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.3-11
    • /
    • 2010
  • The present study confirmed the intelligibility improvement benefit of clear speech by investigating the intelligibility of Korean stops produced in different speaking styles: conversational, citation-form, and clear speech. This finding supports the Hypo- & Hyper-speech theory that speakers adjust vocal effort to accommodate hearers' speech perception difficulty. A progressive intelligibility improvement was found for the three speaking styles investigated: clear speech was more intelligible than citation-form speech citation-form speech was more intelligible than conversational speech and clear speech was also more intelligible than conversational speech. These findings suggest that the manipulations to elicit three distinct speaking styles in a laboratory setting were successful. Korean lenis stops showed the least intelligibility improvement among the three Korean stop types, and this result suggests that lenis stops should be more resistant to intelligibility enhancement efforts in clear speech than aspirated and fortis stops.

  • PDF

Common Speech Database Collection for Telecommunications (통신망환경 한국어 공통음성 DB 구축)

  • Kim Sanghun;Park Moonwhan;Kim Hyunsuk
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.23-26
    • /
    • 2003
  • This paper presents common speech database collection for telecommunication applications. During 3 year project, we will construct very large scale speech and text databases for speech recognition, speech synthesis, and speaker identification. The common speech database has been considered various communication environments, distribution of speakers' sex, distribution of speakers' age, and distribution of speakers' region. It consists of Korean continuous digit, isolated words, and sentences which reflects Korean phonetic coverage. In addition, it consists of various pronunciation style such as read speech, dialogue speech, and semi-spontaneous speech. Thanks to the common speech databases, the duplicated resources of Korean speech industries are prohibited. It encourages domestic speech industries and activate speech technology domestic market.

  • PDF

Rhythmic Differences between Spontaneous and Read Speech of English

  • Kim, Sul-Ki;Jang, Tae-Yeoub
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.49-55
    • /
    • 2009
  • This study investigates whether rhythm metrics can be used to capture the rhythmic differences between spontaneous and read English speech. Transcription of spontaneous speech tokens extracted from a corpus is read by three English native speakers to generate the corresponding read speech tokens. Two data sets are compared in terms of seven rhythm measures that are suggested by previous studies. Results show that there is a significant difference in the values of vowel-based metrics (VarcoV and nPVI-V) between spontaneous and read speech. This manifests a greater variability in vocalic intervals in spontaneous speech than in read speech. The current study is especially meaningful as it demonstrates a way in which speech styles can be differentiated and parameterized in numerical terms.

  • PDF

A Study on the Durational Characteristics of Korean Distant-Talking Speech (한국어 원거리 음성의 지속시간 연구)

  • Kim, Sun-Hee
    • MALSORI
    • /
    • no.54
    • /
    • pp.1-14
    • /
    • 2005
  • This paper presents durational characteristics of Korean distant-talking speech using speech data, which consist of 500 distant-talking utterances and 500 normal utterances of 10 speakers (5 males and 5 females). Each file was segmented and labeled manually and the duration of each segment and each word was extracted. Using a statistical method, the durational change of distant-talking speech in comparison with normal speech was analyzed. The results show that the duration of words with distant-talking speech is increased in comparison with normal style, and that the average unvoiced consonantal duration is reduced while the average vocalic duration is increased. Female speakers show a stronger tendency towards lengthening the duration in distant-talking speech. Finally, this study also shows that the speakers of distant-talking speech could be classified according to their different duration rate.

  • PDF

Monophthong Analysis on a Large-scale Speech Corpus of Read-Style Korean (한국어 대용량발화말뭉치의 단모음분석)

  • Yoon, Tae-Jin;Kang, Yoonjung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.139-145
    • /
    • 2014
  • The paper describes methods of conducting vowel analysis from a large-scale corpus with the aids of forced alignment and optimal formant ceiling methods. 'Read Style Corpus of Standard Korean' is used for building the forced alignment system and a subset of the corpus for the processing and extraction of features for vowel analysis based on optimal formant ceiling. The results of the vowel analysis are reliable and comparable to the results obtained using traditional analytical methods. The findings indicate that the methods adopted for the analysis can be extended and be used for more fine-grained analysis without time-consuming manual labeling without losing accuracy and reliability.