• Title/Summary/Keyword: utterance

Search Result 382, Processing Time 0.02 seconds

The Study of Breath Competence Depending on Utterance Condition by Healthy Speakers: a Preliminary Study (발화조건에 따른 정상 성인의 호흡 능력 차이 비교: 예비연구)

  • Lee, In-Ae;Lee, Hye-Eun;Hwang, Young-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.115-120
    • /
    • 2012
  • This study sought to compare breath competence in three different utterance conditions when reading a passage aloud, making a spontaneous speech, and singing. We tested 15 normal females (ages averaging $24{\pm}4.4$) and measured breath competence through an objective, aero-mechanical instrument called PAS (Phonatory aerodynamic system, model 6600, KAY Electronics, Inc). Breathing sets of inspiration and expiration were measured by breath group number, breath group duration, and the ratio of inspiration to expiration. The results from this study led us to the following conclusion: The breath group number and the breath group duration showed no significant difference. However, the only variance that we could find was in the ratio of inspiration and expiration. In significantly different speech patterns, singing resulted in the most varied ratio of inspiration and expiration, followed by reading a text aloud, and spontaneous speech. The average frequency rates and maximum intensity levels varied with regards to varying utterance conditions. This thus shows that breath competence and phonation competence have a closely interrelated relationship.

Initial-syllable lengthening of an utterance-internal phrase in Korean

  • Yun, Ilsung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.141-151
    • /
    • 2014
  • This study reports anti-hierarchical initial-syllable lengthening of an utterance-internal phrase in Korean. That is, the phrase-initial syllable (e.g., /a/ of "apa-do" or /ma/ of "mapa-do") starting with a voiced phoneme (i.e., vowels or voiced consonants) manifests itself as significantly longer when it is preceded by another phrase without a pause than when it leads an utterance or follows a pause utterance-internally. The phenomenon was examined with regard to two other factors: (1) tempo and (2) tenseness of the consonant (/p, $p^{\prime}$, $p^h$/) following the target syllable /a/. First, the effect of tempo on initial lengthening was not significant. Apart from the statistical significance, however, a tendency was observed, i.e., the slower the tempo is, the greater the lengthening. By contrast, the faster the tempo is, the higher the ratio (%) of lengthening. Second, contrary to our expectations, initial-syllable lengthening was even greater before tense stops /$p^{\prime}$, $p^h$/ than before lax stop /p/ regardless of tempo, and it was remarkable when it comes to the ratio (%), which means that initial lengthening is free of the pre-consonantal vowel shortening effect. Final-syllable lengthening is a pre-boundary marker, while the initial-syllable lengthening is regarded as a post-boundary marker of a phrase.

Effective Korean Speech-act Classification Using the Classification Priority Application and a Post-correction Rules (분류 우선순위 적용과 후보정 규칙을 이용한 효과적인 한국어 화행 분류)

  • Song, Namhoon;Bae, Kyoungman;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.80-86
    • /
    • 2016
  • A speech-act is a behavior intended by users in an utterance. Speech-act classification is important in a dialogue system. The machine learning and rule-based methods have mainly been used for speech-act classification. In this paper, we propose a speech-act classification method based on the combination of support vector machine (SVM) and transformation-based learning (TBL). The user's utterance is first classified by SVM that is preferentially applied to categories with a low utterance rate in training data. Next, when an utterance has negative scores throughout the whole of the categories, the utterance is applied to the correction phase by rules. The results from our method were higher performance over the baseline system long with error-reduction.

The Effect of Preceding Utterance on the User Experience in the Voice Agent Interactions - Focus on the Conversational Types in the Smart Home Context - (음성 에이전트 상호작용에서 선행 발화가 사용자 경험에 미치는 영향 - 스마트홈 맥락에서 대화 유형 조건을 중심으로 -)

  • Kang, Yeseul;Na, Gyounghwa;Choi, Junho
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.620-631
    • /
    • 2021
  • The study aim to test the effect of voice agent's preceding utterance type on the user experience in the smart home contexts by conversation types. Based on two types of conversation (task-oriented vs. relationship-oriented conversations) and two types of utterance (preceding vs. response utterances), four different scenarios were designed for experimental study. A total of 62 participants were divided into two groups by utterance type, and exposed to two scenarios of the conversation types. Likeability, psychological reactance, and perceived intelligence were measured for the user experience of conversational agent. The result showed main effects of likeability in task-oriented conversations, and of psychological reactance in preceding utterances. The interaction effect demonstrated that preceding conversation improved the likeabilitty and perceived intelligence in the task-oriented conversations.

On-Line Blind Channel Normalization for Noise-Robust Speech Recognition

  • Jung, Ho-Young
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.1 no.3
    • /
    • pp.143-151
    • /
    • 2012
  • A new data-driven method for the design of a blind modulation frequency filter that suppresses the slow-varying noise components is proposed. The proposed method is based on the temporal local decorrelation of the feature vector sequence, and is done on an utterance-by-utterance basis. Although the conventional modulation frequency filtering approaches the same form regardless of the task and environment conditions, the proposed method can provide an adaptive modulation frequency filter that outperforms conventional methods for each utterance. In addition, the method ultimately performs channel normalization in a feature domain with applications to log-spectral parameters. The performance was evaluated by speaker-independent isolated-word recognition experiments under additive noise environments. The proposed method achieved outstanding improvement for speech recognition in environments with significant noise and was also effective in a range of feature representations.

  • PDF

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

  • Jung, Ho-Young
    • ETRI Journal
    • /
    • v.29 no.3
    • /
    • pp.300-304
    • /
    • 2007
  • This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.

  • PDF

Statistical Approaches to Convert Pitch Contour Based on Korean Prosodic Phrases (한국어 운율구 기반의 피치궤적 변환의 통계적 접근)

  • Lee, Ki-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1E
    • /
    • pp.10-15
    • /
    • 2004
  • In performing speech conversion from a source speaker to a target speaker, it is important that the pitch contour of the source speakers utterance be converted into that of the target speaker, because pitch contour of a speech utterance plays an important role in expressing speaker's individuality and meaning of the utterance. This paper describes statistical algorithms of pitch contour conversion for Korean language. Pitch contour conversions are investigated at two 1 evels of prosodic phrases: intonational phrase and accentual phrase. The basic algorithm is a Gaussian normalization [7] in intonational phrase. The first presented algorithm is combined with a declination-line of pitch contour in an intonational phrase. The second one is Gaussian normalization within accentual phrases to compensate for local pitch variations. Experimental results show that the algorithm of Gaussian normalization within accentual phrases is significantly more accurate than the other two algorithms in intonational phrase.

The Restructuring in English Utterance and Words and a Use of Textsetting (영어 발화와 가사 리듬의 재구조와 리듬보의 활용)

  • Kim Key-Seop
    • MALSORI
    • /
    • no.40
    • /
    • pp.29-49
    • /
    • 2000
  • This study has two aim: one is to clarify the restructuring of English in utterance and the other is to make use of text-setting to be applied to getting accustomed to the English rhythm and pronunciation. Clitics prove to play a crucial role on the English restructuring, and are found to be attached to their previous and to their next head or host, thus forming, respectively, an on-cliticized rhythm, trochee and a pro-cliticized rhythm, iambus. En-cliticization proves to be preferred to pro-cliticization in most types of English rhythms. Accordingly, the restructuring turn out to occur all over the levels of the Prosodic Hierarchy. That is, syllables, words and clitic groups are restructured in poetry as well as in song words, which means the necessity of restructuring throughout the levels of the Prosodic Hierarchy from the syllable to the utterance. The present study suggests a good use of a rhythmic textsetting for learners of English to get accustomed to the stress-timed rhythm as well as to such changes in pronunciation as reductions, deletions, resolutions, contractions, and rhythms in English.

  • PDF

Prosodic Characteristics of Politeness in Korean (한국어에서의 공손함을 나타내는 운율적 특성에 관한 연구)

  • Ko Hyun-ju;Kim Sang-Hun;Kim Jong-Jin
    • MALSORI
    • /
    • no.45
    • /
    • pp.15-22
    • /
    • 2003
  • This study is a kind of a preliminary study to develop naturalness of dialog TTS system. In this study, as major characteristics of politeness in Korean, temporal(total duration of utterances, speech rate and duration of utterance final syllables) and F0(mean F0, boundary tone pattern, F0 range) features were discussed through acoustic analysis of recorded data of semantically neutral sentences, which were spoken by ten professional voice actors under two conditions of utterance type - namely, normal and polite type. The results show that temporal characteristics were significantly different according to the utterance type but F0 characteristics were not.

  • PDF

SWAPPING NATIVE AND NON-NATIVE SPEAKERS' PROSODY USING THE PSOLA ALGORITHM

  • Yoon Kyu-Chul
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.77-81
    • /
    • 2006
  • This paper presents a technique of imposing the prosodic features of a native speaker's utterance onto the same sentence uttered by a non-native speaker. Three acoustic aspects of the prosodic features were considered: the fundamental frequency (F0) contour, segmental durations, and the intensity contour. The fundamental frequency contour and the segmental durations of the native speaker's utterance were imposed on the non-native speaker's utterance by using the PSOLA (pitch-synchronous overlap and add) algorithm [1] implemented in Praat[2]. The intensity contour transfer was also done in Praat. The technique of transferring one or more of these prosodic features was elaborated and its implications in the area of language education were discussed.

  • PDF