• Title/Summary/Keyword: utterance level

Search Result 42, Processing Time 0.025 seconds

Text-Prompt Speaker Verification using Variable Threshold and Sequential Decision (가변 문턱치와 순차결정법을 통한 문맥요구형 화자확인)

  • Ahn, Sung-Joo;Kang, Sun-Mee;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.41-47
    • /
    • 2000
  • This paper concerns an effective text-prompted speaker verification method to increase the performance of speaker verification. While various speaker verification methods have already been developed, their effectiveness has not yet been formally proven in terms of achieving an acceptable performance level. It is also noted that the traditional methods were focused primarily on single, prompted utterance for verification. This paper, instead, proposes sequential decision method using variable threshold focused at handling two utterances for text-prompted speaker verification. Experimental results show that the proposed speaker verification method outperforms that of the speaker verification scheme without using the sequential decision by a factor of up to 3 times. From these results, we show that the proposed method is highly effective and achieves a reliable performance suitable for practical applications.

  • PDF

A Modified Viterbi Algorithm for Word Boundary Detection Error Compensation (단어 경계 검출 오류 보정을 위한 수정된 비터비 알고리즘)

  • Chung, Hoon;Chung, Ik-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.1E
    • /
    • pp.21-26
    • /
    • 2007
  • In this paper, we propose a modified Viterbi algorithm to compensate for endpoint detection error during the decoding phase of an isolated word recognition task. Since the conventional Viterbi algorithm explores only the search space whose boundaries are fixed to the endpoints of the segmented utterance by the endpoint detector, the recognition performance is highly dependent on the accuracy level of endpoint detection. Inaccurately segmented word boundaries lead directly to recognition error. In order to relax the degradation of recognition accuracy due to endpoint detection error, we describe an unconstrained search of word boundaries and present an algorithm to explore the search space with efficiency. The proposed algorithm was evaluated by performing a variety of simulated endpoint detection error cases on an isolated word recognition task. The proposed algorithm reduced the Word Error Rate (WER) considerably, from 84.4% to 10.6%, while consuming only a little more computation power.

A Situation-Based Dialogue Management with Dialogue Examples (대화 예제를 이용한 상황 기반 대화 관리 시스템)

  • Lee, Cheon-Jae;Jung, Sang-Keun;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.113-115
    • /
    • 2005
  • In this paper, we present POSSDM (POSTECH Situation-Based Dialogue Manager) for a spoken dialogue system using a new example and situation-based dialogue management techniques for effective generation of appropriate system responses. Spoken dialogue system should generate cooperative responses to smoothly control dialogue flow with the users. We introduce a new dialogue management technique incorporating dialogue examples and situation-based rules for EPG (Electronic Program Guide) domain. For the system response inference, we automatically construct and index a dialogue example database from dialogue corpus, and the best dialogue example is retrieved for a proper system response with the query from a dialogue situation including a current user utterance, dialogue act, and discourse history. When dialogue corpus is not enough to cover the domain, we also apply manually constructed situation-based rules mainly for meta-level dialogue management.

  • PDF

Absolute categories and relative categories (절대범주와 상대범주)

  • Kwon, Kyeong-Won
    • English Language & Literature Teaching
    • /
    • v.8 no.2
    • /
    • pp.131-150
    • /
    • 2003
  • The purpose of this paper is to provide two levels of conceptualization of a category such as an absolute category in semantic level and a relative category in pragmatic level on the basis of Aristotelian category theory and prototype category theory. I do not intend to criticize classical category theory and prototype category theory but to show that these two types of category are applied to the different world. Aristotelian categorization is an absolute category because it is based on the possible world called the meta-world and it has an absolute truth value. The members of an absolute category is presented as a set. There is a clear boundary between members and non-members because they are distinguished by the absolute criteria An absolute category is semantic conceptualization. This absolute category is changed into a relative category when it is applied in the real world. A relative category which corresponds to a prototype category is based on the real world called the object world and it has a relative truth value. Here individuals are categorized by the cognition and perception of human beings. A relative category is pragmatic conceptualization. In conclusion, while classical categories which are called absolute categories represent sentence meaning, prototype categories which are called relative categories represent utterance meaning.

  • PDF

The Effect on Manifesting Group Creativity by Empathy Level of Students in the Elementary Science Class (초등 과학 수업에서 공감능력에 따른 집단 구성이 학생들의 집단 창의성 발현에 미치는 영향)

  • Kim, Kyung-won;Yang, Heesun;Kang, Seong-Joo
    • Journal of Korean Elementary Science Education
    • /
    • v.38 no.1
    • /
    • pp.1-15
    • /
    • 2019
  • This study aimed to identify the effects of students' empathy ability on group creativity, when elementary school students perform scientific activity designed to express group creativity. A total of 12 elementary students from a fifth-grade science club participated in this study. A pretest to examine the students' empathic ability was performed to classify them into three groups: A group with high, low and heterogeneous empathic members. The linguistic interaction was analyzed to determine the process of group creativity manifestation; the results were classified into 'metacognitive', 'cognitive', and 'social-communicative'. As a result, groups with high empathic ability showed more frequent interaction in monitoring, planning, and divergent thinking. On the other hand, in the case of the group with low level of empathy, it was confirmed that there are many interactions related to regulation, convergent thinking, and noncohesive prosocial interaction. Also, in the case of heterogeneous group with empathy ability, group creativity utterance on all sides was relatively higher than other groups. As a result of this study, we could confirm the influence of empathy as a strategy to help the group creativity and discuss the educational implications.

Speech emotion recognition based on genetic algorithm-decision tree fusion of deep and acoustic features

  • Sun, Linhui;Li, Qiu;Fu, Sheng;Li, Pingan
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.462-475
    • /
    • 2022
  • Although researchers have proposed numerous techniques for speech emotion recognition, its performance remains unsatisfactory in many application scenarios. In this study, we propose a speech emotion recognition model based on a genetic algorithm (GA)-decision tree (DT) fusion of deep and acoustic features. To more comprehensively express speech emotional information, first, frame-level deep and acoustic features are extracted from a speech signal. Next, five kinds of statistic variables of these features are calculated to obtain utterance-level features. The Fisher feature selection criterion is employed to select high-performance features, removing redundant information. In the feature fusion stage, the GA is is used to adaptively search for the best feature fusion weight. Finally, using the fused feature, the proposed speech emotion recognition model based on a DT support vector machine model is realized. Experimental results on the Berlin speech emotion database and the Chinese emotion speech database indicate that the proposed model outperforms an average weight fusion method.

Alveolar Fricative Sound Errors by the Type of Morpheme in the Spontaneous Speech of 3- and 4-Year-Old Children (자발화에 나타난 형태소 유형에 따른 3-4세 아동의 치경마찰음 오류)

  • Kim, Soo-Jin;Kim, Jung-Mee;Yoon, Mi-Sun;Chang, Moon-Soo;Cha, Jae-Eun
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.129-136
    • /
    • 2012
  • Korean alveolar fricatives are late-developing speech sounds. Most previous research on phonemes used individual words or pseudo words to produce sounds, but word-level phonological analysis does not always reflect a child's practical articulation ability. Also, there has been limited research on articulation development looking at speech production by grammatical morphemes despite its importance in Korean language. Therefore, this research examines the articulation development and phonological patterns of the /s/ phoneme in terms of morphological types produced in children's spontaneous conversational speech. The subjects were twenty-two typically developing 3- and 4-year-old Koreans. All children showed normal levels in three screening tests: hearing, vocabulary, and articulation. Spontaneous conversational samples were recorded at the children's homes. The results are as follows. The error rates decreased with increasing age in all morphological contexts. Also, error percentages within an age group were significantly lower in lexical morphemes than in grammatical morphemes. The stopping of fricative sounds was the main error pattern in all morphological contexts and reduced as age increased. This research shows that articulation performance can differ significantly by morphological contexts. The present study provides data that can be used to identify the difficult context for articulatory evaluation and therapy of alveolar fricative sounds.

Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech (음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성)

  • Kim, Seung-Won;Zheng, Yu;Lee, Gary-Geunbae;Kim, Byeong-Chang
    • MALSORI
    • /
    • no.53
    • /
    • pp.75-92
    • /
    • 2005
  • Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

  • PDF

Syllable-timing Interferes with Korean Learners' Speech of Stress-timed English

  • Lee, Ok-Hwa;Kim, Jong-Mi
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.95-112
    • /
    • 2005
  • We investigate Korean learners' speech-timing of English before and after instruction in comparison with native speech, in an attempt to resolve disagreements in the literature as to whether speech-timing is measurable (Lehiste, 1977; Roach, 1982; Dauer, 1983 vs. Low et al., 2000; Yun 2002; Jian, 2004). We measured the pair-wise variability between the adjacent stressed and unstressed syllables within a foot as well as that among adjacent feet in approximately 555 English sentences, which were read by 29 native speakers and 41 Korean learners in the intermediate proficiency level. The results show that in comparison with native American English, Korean learner speech is before instruction significantly (p<.001) smaller for the pair-wise variability between the adjacent stressed and unstressed syllables within a foot; and significantly (p=.01) bigger for the variability among adjacent feet within the utterance. The learner speech after instruction showed significant (p=.01) improvement in the pair-wise variability of syllable sequence toward native speech values. The variability among adjacent feet was progressively smaller for learner speech before and after instruction and for native speech (p=.03). We thus conclude that the speech timing difference between Korean English and American English is measurable in terms of the duration. of stressed and unstressed syllables and that the latter is stress-timed and the former is syllable-timing interfered.

  • PDF

The Study of Pragmatic Functions of '-ketun(yo)' for Korean grammar teaching on a discourse level (담화 차원의 한국어 문법 교육을 위한 '-거든(요)'의 화용적 기능 분석 연구)

  • Han, Halim
    • Journal of Korean language education
    • /
    • v.28 no.2
    • /
    • pp.209-233
    • /
    • 2017
  • The purpose of this study is to analyze the pragmatic functions of '-ketun(yo)' expressed in the discourse associating with the context of communication based on the actual conversations of Korean native speakers. As discourse is closely related to the context, contextual factors surrounding the discourse should be actively considered in order to reveal the function of grammar expressed in the discourse. Also, there is need to consider the grammatical functions in terms of the linguistic user which is the subject of interaction in the discourse. Based on this necessity, in this study, we analyzed the pragmatic functions of '-ketun(yo).' As a result, '-ketun(yo)-' had a great influence on the formation and expansion of the shared context in communication contexts. The shared context is expanded through generative mutual knowledge and priori mutual knowledge. As a result of the conversation analysis, '-ketun(yo)-' was used at a high frequency in the expansion of generative mutual knowledge formation. In addition, '-ketun(yo)-' appeared to have a discourse cohesion function that binds topics with other topics. In the case that '-ketun(yo)-' is formed through priori mutual knowledge, '-ketun(yo)-' could be used as a sign to lead the union of the speaker and the listener. This study has significance in that it examines the pragmatic functions of '-ketun(yo)-' in relation to the context of communication based on actual utterance.