• Title/Summary/Keyword: speech cues

Search Result 117, Processing Time 0.031 seconds

Individual differences in categorical perception: L1 English learners' L2 perception of Korean stops

  • Kong, Eun Jong
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.63-70
    • /
    • 2019
  • This study investigated individual variability of L2 learners' categorical judgments of L2 stops by exploring English learners' perceptual processing of two acoustic cues (voice onset time [VOT] and f0) and working memory capacity as sources of variation. As prior research has reported that English speakers' greater use of the redundant cue f0 was responsible for gradient processing of native stops, we examined whether the same processing characteristics would be observed in L2 learners' perception of Korean stops (/t/-/th/). 22 English learners of L2 Korean with a range of L2 proficiency participated in a visual analogue scaling task and demonstrated variable manners of judging the L2 Korean stops: Some were more gradient than others in performing the task. Correlation analysis revealed that L2 learners' categorical responses were modestly related to individuals' utilizations of a primary cue for the stop contrast (VOT for L1 English stops and f0 for L2 Korean stops), and were also related to better working memory capacity. Together, the current experimental evidence demonstrates adult L2 learners' top-down processing of stop consonants where linguistic and cognitive resources are devoted to a process of determining abstract phonemic identity.

The relationship between vowel production and proficiency levels in L2 English produced by Korean EFL learners

  • Lee, Seohee;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.1-13
    • /
    • 2019
  • This study explored the relationship between accurate vowel production and proficiency levels in L2 English produced by Korean EFL adult learners. To this end, nine English vowels /i, ɪ, ɛ, æ, ʌ, ɔ, ɑ, ʊ, u/ were selected and adjacent vowels paired up (e.g., /i/-/ɪ/, /u/-/ʊ/, /ɛ/-/æ/, /ʌ/-/ɔ/, /ɔ/-/ɑ/). The spectral features of the pairs were measured instrumentally, namely F1 (indicating tongue height) and F2 (indicating tongue backness). Meanwhile, the durations as well as spectral features of the tense and lax counterparts in /i/-/ɪ/ and /u/-/ʊ/ were measured, as both temporal and spectral features are important in distinguishing them. The findings of this study confirm that higher-rated speakers were better able to distinguish the contrasts in the front vowel pairs /i/-/ɪ/ and /ɛ/-/æ/ than lower-rated learners, but in the central and back vowel pairs /u/-/ʊ/and /ʌ/-/ɔ/ (though not /ɔ/-/ɑ/), Korean EFL learners generally showed difficulty distinguishing adjacent vowels with spectral cues. On the other hand, the durations of the tense and lax vowels showed that the lower-rated speakers were less able to use the temporal feature to differentiate tense vowels from their lax counterparts, unlike previous studies that found that in general Korean learners depend excessively on the temporal cue to distinguish tense and lax vowels.

Effects of age of L2 acquisition and L2 experience on the production of English vowels by Korean speakers

  • Eunhae Oh;Eunyoung Shin
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.9-16
    • /
    • 2023
  • The current study investigated the influence of age of L2 acquisition (AOA) and length of residence (LOR) in the L2 setting country on the production of voicing-conditioned vowel duration and spectral qualities in English by Korean learners. The primary aim was to explore the ways in which the language-specific phonetic features are acquired by the age of onset and L2 experience. Analyses of the archived corpus data produced by 45 native speakers of Korean showed that, regardless of AOA or LOR, absolute vowel duration was used as a salient correlate of voicing contrast in English for Korean learners. The accuracy of relative vowel duration was influenced more by onset age than by L2 experience, suggesting that being exposed to English at an early age may benefit the acquisition of temporal dimension. On the other hand, the spectral characteristics of English vowels were more consistently influenced by L2 experience, indicating that immersive experience in the L2 speaking environment are likely to improve the accurate production of vowel quality. The distinct influence of the onset age and L2 experience on the specific phonetic cues in L2 vowel production provides insight into the intricate relationship between the two factors on the manifestation of L2 phonological knowledge.

A STUDY ON THE IMPLEMENTATION OF ARTIFICIAL NEURAL NET MODELS WITH FEATURE SET INPUT FOR RECOGNITION OF KOREAN PLOSIVE CONSONANTS (한국어 파열음 인식을 위한 피쳐 셉 입력 인공 신경망 모델에 관한 연구)

  • Kim, Ki-Seok;Kim, In-Bum;Hwang, Hee-Yeung
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.535-538
    • /
    • 1990
  • The main problem in speech recognition is the enormous variability in acoustic signals due to complex but predictable contextual effects. Especially in plosive consonants it is very difficult to find invariant cue due to various contextual effects, but humans use these contextual effects as helpful information in plosive consonant recognition. In this paper we experimented on three artificial neural net models for the recognition of plosive consonants. Neural Net Model I used "Multi-layer Perceptron ". Model II used a variation of the "Self-organizing Feature Map Model". And Model III used "Interactive and Competitive Model" to experiment contextual effects. The recognition experiment was performed on 9 Korean plosive consonants. We used VCV speech chains for the experiment on contextual effects. The speech chain consists of Korean plosive consonants /g, d, b, K, T, P, k, t, p/ (/ㄱ, ㄷ, ㅂ, ㄲ, ㄸ, ㅃ, ㅋ, ㅌ, ㅍ/) and eight Korean monothongs. The inputs to Neural Net Models were several temporal cues - duration of the silence, transition and vot -, and the extent of the VC formant transitions to the presence of voicing energy during closure, burst intensity, presence of asperation, amount of low frequency energy present at voicing onset, and CV formant transition extent from the acoustic signals. Model I showed about 55 - 67 %, Model II showed about 60%, and Model III showed about 67% recognition rate.

  • PDF

Perceptual training on Korean obstruents for Vietnamese learners (베트남 한국어 학습자를 위한 한국어 자음 지각 훈련 연구)

  • Hyosung Hwang
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.17-26
    • /
    • 2023
  • This study aimed to reveal how Vietnamese adult learners at three different proficiency levels perceive Korean word-initial obstruents and whether errors can be corrected through perceptual training. To this end, 105 Vietnamese beginner, intermediate, and advanced learners were given perceptual training on Korean word-initial. The training materials were created by actively utilizing Korean minimal pairs as natural stimuli recorded by native speakers. Learners in the experimental group performed five 20-40 minute self-directed perceptual training sessions over a period of approximately two weeks, while learners in the control group only participated in the pretest and posttest. The results showed a significant improvement in the perception of sounds that were difficult to distinguish before training, and both beginners and advanced learners benefited from the training. This study confirmed that large-scale perceptual training can play an important role in helping Vietnamese learners learn the appropriate acoustic cues to distinguish different sounds in Korean.

Sound Source Localization Technique at a Long Distance for Intelligent Service Robot (지능형 서비스 로봇을 위한 원거리 음원 추적 기술)

  • Lee Ji-Yeoun;Hahn Min-Soo
    • MALSORI
    • /
    • no.57
    • /
    • pp.85-97
    • /
    • 2006
  • This paper suggests an algorithm that can estimate the direction of the sound source in real time. The algorithm uses the time difference and sound intensity information among the recorded sound source by four microphones. Also, to deal with noise of robot itself, the Kalman filter is implemented. The proposed method can take shorter execution time than that of an existing algorithm to fit the real-time service robot. Also, using the Kalman filter, signal ratio relative to background noise, SNR, is approximately improved to 8 dB. And the estimation result of azimuth shows relatively small error within the range of ${\pm}7$ degree.

  • PDF

The Acoustic Realization of Phrasal Verb vs. Verb-preposition (구절 동사와 전치사 수반동사의 의미에 따른 음성적 실현)

  • Kim, Hee-Sung;Song, Ji-Yeon;Kim, Kee-Ho
    • MALSORI
    • /
    • no.63
    • /
    • pp.67-84
    • /
    • 2007
  • Verb phrase could have two different meanings according to which is followed after verb; adverb or preposition. The meaning of 'verb+adverb' is deduced from a figurative meaning which is idiomatic expression, and 'verb+preposition' is interpreted as the literal meaning. The purpose of this study is to observe how English native speakers and Korean leaners of English distinguish two sentences of the same word strings with acoustic cues like pause and duration. According to the result, as pause was used for meaning distinction, it was likely that the pause length preceding prepositions was longer than that of following adverbs. To distinguish two sentences of the same word strings, all participants seemed to use pause, verb lengthening and adverb/preposition lengthening. Among them, there is a hierarchical significance; in sequence, pause, verb lengthening, adverb/preposition lengthening.

  • PDF

A study on the voiceless plosives from the English and Korean spontaneous speech corpus (영어와 한국어 자연발화 음성 코퍼스에서의 무성 파열음 연구)

  • Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.45-53
    • /
    • 2019
  • The purpose of this work was to examine the factors affecting the identities of the voiceless plosives, i.e. English [p, t, k] and Korean [ph, th, kh], from the spontaneous speech corpora. The factors were automatically extracted by a Praat script and the percent correctness of the discriminant analyses was incrementally assessed by increasing the number of factors used in predicting the identities of the plosives. The factors included the spectral moments and tilts of the plosive release bursts, the post-burst aspirations and the vowel onsets, the durations such as the closure durations and the voice onset times (VOTs), the locations within words and utterances and the identities of the following vowels. The results showed that as the number of factors increased up to five, so did the percent correctness of the analyses, resulting in 74.6% for English and 66.4% for Korean. However, the optimal number of factors for the maximum percent correctness was four, i.e. the spectral moments and tilts of the release bursts and the following vowels, the closure durations and the VOTs. This suggests that the identities of the voiceless plosives are mostly determined by their internal and vowel onset cues.

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

  • Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.150-151
    • /
    • 2010
  • 본 논문에서는 잡음환경에서의 이중채널 음성인식을 위한 통계모델 기반 음성구간 검출 방법을 제안한다. 제안된 방법에서는 다채널 입력 신호로부터 얻어진 공간정보를 이용하여 음성 존재 및 부재 확률모델을 구하고 이를 통해 음성구간 검출을 행한다. 이때, 공간정보는 두 채널간의 상호 시간 차이와 상호 크기 차이로, 음성 존재 및 부재 확률은 가우시안 커널 밀도 기반의 확률모델로 표현된다. 그리고 음성구간은 각 시간 프레임 별 음성 존재 확률 대비 음성 부재 확률의 비를 추정하여 검출된다. 제안된 음성구간 검출 방법의 평가를 위해 검출된 구간만을 입력으로 하는 음성인식 성능을 측정한다. 실험결과, 제안된 공간정보를 이용하는 통계모델 기반의 음성구간 검출 방법이 주파수 에너지를 이용하는 통계모델 기반의 음성구간 검출 방법과 주파수 스펙트럼 밀도 기반 음성구간 검출 방법에 비해 각각 15.6%, 15.4%의 상대적 오인식률 개선을 보였다.

  • PDF

Articulatory Attributes in Korean Nonassimilating Contexts

  • Son, Minjung
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.109-121
    • /
    • 2013
  • This study examined several kinematic properties of the primary articulator (the tongue dorsum) and the supplementary articulator (the jaw) in the articulation of the voiceless velar stop (/k/) within nonassimilating contexts. We examined in particular the spatiotemporal properties (constriction duration and constriction maxima) from the constriction onset to the constriction offset by analyzing a velar (/k/) followed by the coronal fricative (/s/), the coronal stop (/t/), and the labial (/p/) in across-word boundary conditions (/k#s/, /k#t/, and /k#p/). Along with these measurements, we investigated intergestural temporal coordination between C1 and C2 and the jaw articulator in relation to its coordination with the articulation of consonant sequences. The articulatory movement data was collected by means of electromagnetic midsagittal articulometry (EMMA). Four native speakers of Seoul Korean participated in the laboratory experiment. The results showed several characteristics. First, a velar (/k/) in C1 was not categorically reduced. Constriction duration and constriction degree of the velar (/k/) were similar within nonassimilating contexts (/k#s/=/k#t/=/k#p/). This might mean that spatiotemporal attributes during constriction duration were stable and consistent across different contexts, which might be subsequently associated with the nontarget status of the velar in place assimilation. Second, the gestural overlap could be represented as the order of /k#s/ (less) < /k#p/ (intermediate) < /k#t/ (more) as we measured the onset-to-onset lag (a longer lag indicated shorter gestural overlap.). This indicates a gestural overlap within nonassimilating contexts may not be constrained by any of the several constraints including the perceptual recoverability constraint (e.g., more overlap in Front-to-Back sequences compared to the reverse order (Back-to-Front) since perceptual cues in C1 can be recovered anytime during C2 articulation), the low-level speech motor constraint (e.g., more overlap in lingual-nonlingual sequences as compared to the lingual-lingual sequences), or phonological contexts effects (e.g., similarity in gestural overlap within nonassimilating contexts). As one possible account for more overlap in /k#t/ sequences as compared to /k#p/, we suspect speakers' knowledge may be receptive to extreme encroachment on C1 by the gestural overlap of the coronal in C2 since it does not obscure the perceptual cue of C1 as much as the labial in C2. Third, actual jaw position during C2 was higher in coronals (/s/, /t/) than in the labial (/p/). However, within the coronals, there was no manner-dependent jaw height difference in C2 (/s/=/t/). Vertical jaw position of C1 and C2 was seen as inter-dependent as higher jaw position in C1 was closely associated with C2. Lastly, a greater gap in jaw height was associated with longer intergestural timing (e.g., less overlap), but was confined to the cluster type (/kp/) with the lingual-nonlingual sequence. This study showed that Korean jaw articulation was independent from coordinating primary articulators in gestural overlap in some cluster types (/k#s/, /k#t/) while not in others (e.g., /k#p/). Overall, the results coherently indicate the velar stop (/k/) in C1 was robust in articulation, which may have subsequently contributed to the nontarget status of the velar (/k/) in place assimilation processes.