Speech Perception and Production of English Postvocalic Voicing by Korean and English Speakers

  • Chang, Woo-Hyeok
    • Speech Sciences
    • v.13 no.2
    • pp.107-120
    • 2006
  • The main purpose of this study is to investigate whether Korean learners can use the vowel duration cue to distinguish voicing contrasts in word-final consonants in English. Given that the Korean group's performance on the auditory task was much better than their performance on the identification task or on the production task, we conclude that the AX discrimination task makes contact with a different layer of perception. In particular, the AX discrimination task can be done at the auditory or phonetic level, where differences in vowel length are still encoded in the representation. In contrast, the identification and production tasks are probing the mental representation of vowel length and voicing. It was also founded that Korean speakers stored neither vowel length nor voicing in memorized representations and did not internalize the lengthening of the preceding vowel as a rule to differentiate the voicing contrasts of final consonants, even though they were able to detect the acoustic differences in vowel duration provided that they were tested in an appropriate task.

Gradient Reduction of $C_1$ in /pk/ Sequences

  • Son, Min-Jung
    • Speech Sciences
    • v.15 no.4
    • pp.43-60
    • 2008
  • Instrumental studies (e.g., aerodynamic, EPG, and EMMA) have shown that the first of two stops in sequence can be articulatorily reduced in time and space sometimes; either gradient or categorical. The current EMMA study aims to examine possible factors_linguistic (e.g., speech rate, word boundary, and prosodic boundary) and paralinguistic (e.g., natural context and repetition)_to induce gradient reduction of $C_1$ in /pk/ cluster sequences. EMMA data are collected from five Seoul-Korean speakers. The results show that gradient reduction of lip aperture seldom occurs, being quite restricted both in speaker frequency and in token frequency. The results also suggest that the place assimilation is not a lexical process, implying that speakers have not fully developed this process to be phonologized in the abstract level.

Performance improvement of text-dependent speaker verification system using blind speech segmentation and energy weight (Blind speech segmentation과 에너지 가중치를 이용한 문장 종속형 화자인식기의 성능 향상)

  • Kim Jung-Gon;Kim Hyung Soon
    • no.47
    • pp.131-140
    • 2003
  • We propose a new method of generating client models for HMM based text-dependent speaker verification system with only a small amount of training data. To make a client model, statistical methods such as segmental K-means algorithm are widely used, but they do not guarantee the quality or reliability of a model when only limited data are avaliable. In this paper, we propose a blind speech segmentation based on level building DTW algorithm as an alternative method to make a client model with limited data. In addition, considering the fact that voiced sounds have much more speaker-specific information than unvoiced sounds and energy of the former is higher than that of the latter, we also propose a new score evaluation method using the observation probability raised to the power of weighting factor estimated from the normalized log energy. Our experiment shows that the proposed methods are superior to conventional HMM based speaker verification system.

An Experimental Study on Focus Structures of English Utterances by Native Speakers and Korean Learners (원어민 화자와 한국인 학습자 영어 발화의 초점구조에 대한 실험음성학적 연구;협의초점과 광의초점을 중심으로)

  • Choi, Kyung-Min;Jang, Tae-Yeoub
    • Proceedings of the KSPS conference
    • 2006.11a
    • pp.75-79
    • 2006
  • In this study, we investigate ways that focus is realized in English utterances produced by native speakers of English and Korean learners. As compared to the previous studies which deal mainly with functional aspects of focus as a part of intonational structure, we attempt to provide more quantitative information on F0 and discover the extent to which Korean learners distinguish focus types in their English utterance production. On the test sentences designed to be disambiguated by correct focus realization, it is found that, even advanced-level Korean learners, unlike native speakers, hardly employ F0 to clarify the specific meaning of English utterances.

Pronunciation Lexicon Optimization with Applying Variant Selection Criteria (발음 변이의 발음사전 포함 결정 조건을 통한 발음사전 최적화)

  • Jeon, Je-Hun;Chung, Min-Hwa
    • Proceedings of the KSPS conference
    • /
    • /
    • /
  • This paper describes how a domain dependent pronunciation lexicon is generated and optimized for Korean large vocabulary continuous speech recognition(LVCSR). At the level of lexicon, pronunciation variations are usually modeled by adding pronunciation variants to the lexicon. We propose the criteria for selecting appropriate pronunciation variants in lexicon: (i) likelihood and (ii) frequency factors to select variants. Our experiment is conducted in three steps. First, the variants are generated with knowledge-based rules. Second, we generate a domain dependent lexicon which includes various numbers of pronunciation variants based on the proposed criteria. Finally, the WERs and RTFs are examined with each lexicon. In the experiment, 0.72% WER reduction is obtained by introducing the variants pruning criteria. Furthermore, RTF is not deteriorated although the average number of variants is higher than that of compared lexica.

Design and Manufacture of a Device for the Recognition of Long Vowels (장모음 인식장치 설계 제작)

  • 구용회
    • Journal of the Korean Institute of Telematics and Electronics T
    • v.35T no.3
    • pp.9-14
    • 1998
  • The speech recognition on long vowels are carried out by electric circuits. A level compressor is able to transform the wave of voice to serial pulses. The obtained pulses have informations to distinguish the vowels. The sampling of the pulses is carried out by the register which picks up a series of serial signals in a pitch of a vowel as an unit. The timing control pulses such as sampling pulses are generated by using peak pulses in the speech wave. The parallel data in the register assign the phonetic symbol by means of the decision making circuit which carries out the IF-THEN rule.

The Usage of Phoneme Duration Information for Rejecting Garbage Sentences (소음문장 제거를 위한 음소지속시간 사용)

  • Koo Myoung-Wan;Kim Ho-Kyoung;Park Sung-Joon;Kim Jae-In
    • Proceedings of the KSPS conference
    • 2003.05a
    • pp.219-222
    • 2003
  • In this paper, we study the usage of phoneme duration information for rejection garbage sentence. First, we build a phoneme duration modeling in a speech recognition system based on dicicion tree state tying, We assume that phone duration has a Gamma distribution. Next, we build a verification module in which word-level confidence measure is used. Finally, we make a comparative study on phoneme duration with speech DB obtained from the live system. This DB consistes of OOT(out-of-task) and ING(in-grammar) utterences. the usage of phone duration information yields that OOT recognition rate is improved by 46% and that another 8.4% error rate is reduced when combined with utterence verification module.

A Situation-Based Dialogue Management with Dialogue Examples (대화 예제를 이용한 상황 기반 대화 관리 시스템)

  • Lee, Cheon-Jae;Jung, Sang-Keun;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • 2005.11a
    • pp.113-115
    • 2005
  • In this paper, we present POSSDM (POSTECH Situation-Based Dialogue Manager) for a spoken dialogue system using a new example and situation-based dialogue management techniques for effective generation of appropriate system responses. Spoken dialogue system should generate cooperative responses to smoothly control dialogue flow with the users. We introduce a new dialogue management technique incorporating dialogue examples and situation-based rules for EPG (Electronic Program Guide) domain. For the system response inference, we automatically construct and index a dialogue example database from dialogue corpus, and the best dialogue example is retrieved for a proper system response with the query from a dialogue situation including a current user utterance, dialogue act, and discourse history. When dialogue corpus is not enough to cover the domain, we also apply manually constructed situation-based rules mainly for meta-level dialogue management.

Cognitive neuropsychological assesment in pure alexic patient with letter-by-letter reading using fMRl - Single case study - (주변성 난독증의 특성과 대뇌활성화 양상 - 단일사례연구 -)

  • Sohn, Hyo-Jeong;Pyun, Sung-Bom;Kim, Chung-Myung;Nam, Ki-Chun
    • /
    • /
    • /
    • 2005
  • In this study we investigated the cognitive neuropsychological characteristics and the underlying mechanism in a letter-by-letter reading dyslexic patient after cerebral infarct of left posterior cerebral artery using fMRl, The results of cognitive neuropsychological assesment are visual perception was appropriate, and semantic categorization, picture naming and picture-word matching tasks were above83% correct, respectively. However, she was very poor in lexical decision task. The selective reading impairment is thought to result from the disruption of the left occipitotemporal region included fusiform gyrus. In fMRl results, the activation level increase din the right occipitotemporal region included fusiform gyrus compared with normal group in compensation for left impairment and more increased in pseudo word reading task than word reading on account of familiarity.

Prediction of Prosodic Boundary Strength by means of Three POS(Part of Speech) sets (품사셋에 의한 운율경계강도의 예측)

  • Eom Ki-Wan;Kim Jin-Yeong;Kim Seon-Mi;Lee Hyeon-Bok
    • no.35_36
    • pp.145-155
    • 1998
  • This study intended to determine the most appropriate POS(Part of Speech) sets for predicting prosodic boundary strength efficiently. We used 3-level POB bets which Kim(1997), one of the authors, has devised. Three POS sets differ from each other according to how much grammatical information they have: the first set has maximal syntactic and morphological information which possibly affects prosodic phrasing, and the third set has minimal one. We hand-labelled 150 sentences using each of three POS sets and conducted perception test. Based on the results of the test, stochastic language modeling method was used to predict prosodic boundary strength. The results showed that the use of each POS set led to not too much different efficiency in the prediction, but the second set was a little more efficient than the other two. As far as the complexity in stochastic language modeling is concerned, however, the third set may be also preferable.

