• Title/Summary/Keyword: Phonetics

Search Result 948, Processing Time 0.024 seconds

Closure durations of Korean stops at three positions

  • Yungdo Yun
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.11-17
    • /
    • 2022
  • This study investigates closure durations of Korean stops in terms of laryngeal contrasts, places of articulation, and three positions within words. Twenty-two Korean speakers produced the nonsense words containing Korean stops found in word-initial and word-final positions and between vowels. The statistical results showed that the closure durations differed significantly by laryngeal contrast and place of articulation. In addition, the differences by position within words were marginally significant. The closure durations were in the order of lenis < aspirated < fortis stops by laryngeal contrast, velar < alveolar < bilabial stops by place of articulation, and word-final < word-initial < between vowels by positions within words. The laryngeal contrasts were neutralized in word-final position as per coda neutralization in Korean phonology. This study shows that closure durations should be considered a valuable phonetic cue to identify stops on par with voice onset time and f0.

Perceptual weighting on English lexical stress by Korean learners of English

  • Goun Lee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.19-24
    • /
    • 2022
  • This study examined which acoustic cue(s) that Korean learners of English give weight to in perceiving English lexical stress. We manipulated segmental and suprasegmental cues in 5 steps in the first and second syllables of an English stress minimal pair "object". A total of 27 subjects (14 native speakers of English and 13 Korean L2 learners) participated in the English stress judgment task. The results revealed that native Korean listeners used the F0 and intensity cues in identifying English stress and weighted vowel quality most strongly, as native English listeners did. These results indicate that Korean learners' experience with these cues in L1 prosody can help them attend to these cues in their L2 perception. However, L2 learners' perceptual attention is not entirely predicted by their linguistic experience with specific acoustic cues in their native language.

Japanese and Korean speakers' production of Japanese fricative /s/ and affricate /ts/

  • Yamakawa, Kimiko;Amano, Shigeaki
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.13-19
    • /
    • 2022
  • This study analyzed the pronunciations of Japanese fricative /s/ and affricate /ts/ by 24 Japanese and 40 Korean speakers using the rise and steady+decay durations of their frication part in order to clarify the characteristics of their pronunciations. Discriminant analysis revealed that Japanese speakers' /s/ and /ts/ were well classified by the acoustic boundaries defined by a discriminant function. Using this boundary, Korean speakers' production of /s/ and /ts/ was analyzed. It was found that, in Korean speakers' pronunciation, misclassification of /s/ as /ts/ was more frequent than that of /ts/ as /s/, indicating that both the /s/ and /ts/ distributions shift toward short rise and steady+decay durations. Moreover, their distributions were very similar to those of Korean fricatives and affricates. These results suggest that Korean speakers' classification error might be because of their use of Korean lax and tense fricatives to pronounce Japanese /s/, and Korean lax and tense affricates to pronounce Japanese /ts/.

Comparing English and Korean speakers' word-final /rl/ clusters using dynamic time warping

  • Cho, Hyesun
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.29-36
    • /
    • 2022
  • The English word-final /rl/ cluster poses a particular problem for Korean learners of English because it is the sequence of two sounds, /r/ and /l/, which are not contrastive in Korean. This study compared the similarity distances between English and Korean speakers' /rl/ productions using the dynamic time warping (DTW) algorithm. The words with /rl/ (pearl, world) and without /rl/ (bird, word) were recorded by four English speakers and four Korean speakers, and compared pairwise. The F2-F1 trajectories, the acoustic correlate of velarized /l/, and F3 trajectories, the acoustic correlate of /r/, were examined. Formant analysis showed that English speakers lowered F2-F1 values toward the end of a word, unlike Korean speakers, suggesting the absence of /l/ in Korean speakers. In contrast, there was no significant difference in F3 values. Mixed-effects regression analyses of the DTW distances revealed that Korean speakers produced /r/ similarly to English speakers but failed to produce the velarized /l/ in /rl/ clusters.

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

  • Miran Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.13-20
    • /
    • 2023
  • This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.

Zero-shot voice conversion with HuBERT

  • Hyelee Chung;Hosung Nam
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.69-74
    • /
    • 2023
  • This study introduces an innovative model for zero-shot voice conversion that utilizes the capabilities of HuBERT. Zero-shot voice conversion models can transform the speech of one speaker to mimic that of another, even when the model has not been exposed to the target speaker's voice during the training phase. Comprising five main components (HuBERT, feature encoder, flow, speaker encoder, and vocoder), the model offers remarkable performance across a range of scenarios. Notably, it excels in the challenging unseen-to-unseen voice-conversion tasks. The effectiveness of the model was assessed based on the mean opinion scores and similarity scores, reflecting high voice quality and similarity to the target speakers. This model demonstrates considerable promise for a range of real-world applications demanding high-quality voice conversion. This study sets a precedent in the exploration of HuBERT-based models for voice conversion, and presents new directions for future research in this domain. Despite its complexities, the robust performance of this model underscores the viability of HuBERT in advancing voice conversion technology, making it a significant contributor to the field.

Analyzing vowel variation in Korean dialects using phone recognition

  • Jooyoung Lee;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.101-107
    • /
    • 2023
  • This study aims to propose an automatic method of detecting vowel variation in the Korean dialects of Gyeong-sang and Jeol-la. The method is based on error patterns extracted using phone recognition. Canonical and recognized phone sequences are compared, and statistical analyses distinguish the vowels appearing in both dialects, the dialect-common vowels, and the vowels with high mismatch rates for each dialect. The dialect-common vowels show monophthongization of diphthongs. The vowels unique to the dialects are /we/ to [e] and /ʌ/ to [ɰ] for Gyeong-sang dialect, and /ɰi/ to [ɯ] in Jeol-la dialect. These results corroborate previous dialectology reports regarding phonetic realization of the Korean dialects. The current method provides a possibility of automatic explanation of the dialect patterns.

Attentional modulation on multiple acoustic cues in phonological processing of L2 sounds

  • Hyunjung Lee;Eun Jong Kong
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.11-16
    • /
    • 2023
  • The present study examines how a cognitive attention affects Korean learners of English (L2) in perceiving the English stop voicing distinction (/d/-/t/). This study tested the effect of attentional distractor on primary and non-primary acoustic cues, focusing on the role of Voice Onset Time (VOT) and fundamental frequency (F0). Using the dual-task paradigm, 28 Korean adult learners of English participated in the stop identification task carried with (distractor) and without (no-distractor) arithmetic calculation. Results showed that when distracted, Korean learners' sensitivity to VOT decreased as priorly reported with native English speakers. Furthermore, as F0 is a primary cue for a L1 Korean stop laryngeal contrast, its role in L2 English voicing distinction was also affected by a distractor, without compensating for the reduced VOT sensitivity. These findings suggest that flexible use of multiple cues in L1 is not necessarily beneficial for L2 phonological processing when coping with a adverse listening condition.

Vowel epenthesis and stress-focus interaction in L2 speech perception

  • Goun Lee;Dong-Jin Shin
    • Phonetics and Speech Sciences
    • /
    • v.16 no.2
    • /
    • pp.11-17
    • /
    • 2024
  • The goal of the current study is to investigate whether L2 learners' perceptual ability regarding epenthetic vowels is interconnected with other aspects of speech recognition, such as lexical stress, sentence focus, and vowel recognition. Twenty-five Korean L2 learners of English participated in perception experiments assessing vowel epenthesis oddity, lexical stress oddity, sentence focus oddity, and vowel identification. Results indicate that accuracy on the vowel epenthesis oddity test is influenced by both lexical stress and sentence focus, suggesting that perceptual ability regarding epenthetic vowels is influenced by the acquisition of L2 rhythmic structure at both word and sentence levels. Additionally, this study identifies a proficiency effect on vowel epenthesis recognition, implying that the influence of L1 phonotactics diminishes as L2 proficiency increases. Taken together, this study illustrates the interaction between perceptual abilities in vowel epenthesis and prosodic stress in the field of L2 speech perception.

Exploring stress encoding cues in English by Korean L2 speakers

  • Goun Lee
    • Phonetics and Speech Sciences
    • /
    • v.16 no.3
    • /
    • pp.33-38
    • /
    • 2024
  • The present study investigated the perceptual cues utilized by Korean L2 learners of English in recognizing lexical stress in English nonwords, with a focus on the roles of fundamental frequency (F0) and duration. Twenty-three Korean learners of English participated in a sequence recall task involving nonword stimuli under five different conditions: (1) the naturally-produced stimuli, (2) the duration-only condition, (3) the F0-only condition, (4) the duration-F0 matching condition, and (5) the duration-F0 conflicting condition. The results demonstrate that F0 is the primary cue for stress perception among Korean L2 learners, whereas duration acts as a secondary cue, particularly when F0 is unreliable or absent. These findings highlight the influence of L1 prosodic structures on L2 perception and suggest that Korean L2 learners adapt their perceptual weighting of stress based on cue availability. This study contributes to the understanding of the role of cue weighting in L2 prosodic acquisition.