• Title/Summary/Keyword: English speech analysis

Search Result 164, Processing Time 0.019 seconds

Automatic pronunciation assessment of English produced by Korean learners using articulatory features (조음자질을 이용한 한국인 학습자의 영어 발화 자동 발음 평가)

  • Ryu, Hyuksu;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.103-113
    • /
    • 2016
  • This paper aims to propose articulatory features as novel predictors for automatic pronunciation assessment of English produced by Korean learners. Based on the distinctive feature theory, where phonemes are represented as a set of articulatory/phonetic properties, we propose articulatory Goodness-Of-Pronunciation(aGOP) features in terms of the corresponding articulatory attributes, such as nasal, sonorant, anterior, etc. An English speech corpus spoken by Korean learners is used in the assessment modeling. In our system, learners' speech is forced aligned and recognized by using the acoustic and pronunciation models derived from the WSJ corpus (native North American speech) and the CMU pronouncing dictionary, respectively. In order to compute aGOP features, articulatory models are trained for the corresponding articulatory attributes. In addition to the proposed features, various features which are divided into four categories such as RATE, SEGMENT, SILENCE, and GOP are applied as a baseline. In order to enhance the assessment modeling performance and investigate the weights of the salient features, relevant features are extracted by using Best Subset Selection(BSS). The results show that the proposed model using aGOP features outperform the baseline. In addition, analysis of relevant features extracted by BSS reveals that the selected aGOP features represent the salient variations of Korean learners of English. The results are expected to be effective for automatic pronunciation error detection, as well.

An Analysis of $H^*$ Production by Korean Learners of English according to the Focus of English Sentences in Comparison with Native Speakers of English and Its Pedagogical Implications (영어 원어민과 비교한 한국인 학습자의 영어 문장 초점에 따른 영어 고성조 구현의 분석과 억양교육에 대한 시사점)

  • Yi, So-Pae
    • Phonetics and Speech Sciences
    • /
    • v.3 no.3
    • /
    • pp.57-62
    • /
    • 2011
  • Focused items in English sentences are usually accompanied by changes in acoustic manifestation. This paper investigates the acoustic characteristics of $H^*$ in English utterances produced by natives speakers of English and Korean learners of English. To obtain more reliable results, the changes of the acoustic feature values (F0, intensity, syllable duration) were normalized by a median value and a whole duration of each utterance. Acoustic values of sentences with no focused words were compared with those of sentences with focused words within each group (Americans vs. Koreans). Sentences with focused words were compared between the two groups, too. In the instances in which a significant Group x Focus Location (initial, middle and final of a sentence) interaction was obtained, further analysis testing the effect of Group on each Focus Location was conducted. The analysis revealed that Korean learners of English produced focused words with lower F0, lower intensity and shorter syllable duration than native speakers of English. However, the effect of intensity change caused by focus was not significant within each group. Further analysis examining the interaction of Group and Focus Location showed that the change in F0 produced by Korean group was significantly lower in the middle and the final positions of sentences than by American group. Implications for the intonation training were also discussed.

  • PDF

Korean speakers hyperarticulate vowels in polite speech

  • Oh, Eunhae;Winter, Bodo;Idemaru, Kaori
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.15-20
    • /
    • 2021
  • In line with recent attention to the multimodal expression of politeness, the present study examined the association between polite speech and acoustic features through the analysis of vowels produced in casual and polite speech contexts in Korean. Fourteen adult native speakers of Seoul Korean produced the utterances in two social conditions to elicit polite (professor) and casual (friend) speech. Vowel duration and the first (F1) and second formants (F2) of seven sentence- and phrase-initial monophthongs were measured. The results showed that polite speech shares acoustic similarities with vowel production in clear speech: speakers showed greater vowel space expansion in polite than casual speech in an effort to enhance perceptual intelligibility. Especially, female speakers hyperarticulated (front) vowels for polite speech, independent of speech rate. The implications for the acoustic encoding of social stance in polite speech are further discussed.

Development of English Stress and Intonation Training System and Program for the Korean Learners of English Using Personal Computer (P.C.) (퍼스컴을 이용한 영어 강세 및 억양 교육 프로그램의 개발 연구)

  • Jeon, B.M.;Pae, D.B.;Lee, C.H.;Yu, C.K.
    • Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.57-75
    • /
    • 1999
  • The purpose of this paper is to develop an English prosody training system using PC for Korean learners of English. The program is called Intonation Training Tool (ITT). It operates on DOS 5.0. The hardware for this program requires over IBM PC 386 with 4 MBytes main memory, SVGA (1 MByte or more) for graphic, soundblaster 16 and over 14 inch monitor size. The ITT program operates this way: the learners can listen as well as see the English teacher's stress and intonation patterns on the monitor. The learner practices the same patterns with a microphone. This program facilitates the learner's stress and intonation patterns to overlap the teacher's patterns. The learner can find his/her stress and intonation errors and correct these independently. This program is expected to be a highly efficient learning tool for Korean learners of English in their English prosody training in the English class without the aid of a native English speaker in the classroom.

  • PDF

An Analysis of the English l Sound Produced by Korean Students

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.53-62
    • /
    • 2008
  • The purpose of this study was to examine the English l sound in an English short story produced by 16 Korean students in order to determine various allophones of the sound using acoustic visual displays and perceptual judgments. The subjects read the story in a quiet office at normal speed. Each word included the lateral sound in onset or coda positions and before a vowel of the following word. Results showed as follows: Firstly, there was a durational difference between the two major groups. Also the majority of the subjects produced the clear l regardless of the contexts. Some students produced the sound as the Korean flap or the English glide [r]. A few missing cases were also seen. The dark l was mostly produced by the subjects of English majors in coda position with a few cases before a vowel in a phrase. Visual displays using the computer analysis were very helpful in distinguishing lateral variants but sometimes perceptual process would be necessary to judge them in fast and weak production of the target word. Further studies would be desirable to test the discrepancies between the acoustical and perceptual decisions.

  • PDF

Speech Rhythm Metrics for Automatic Scoring of English Speech by Korean EFL Learners

  • Jang, Tae-Yeoub
    • MALSORI
    • /
    • no.66
    • /
    • pp.41-59
    • /
    • 2008
  • Knowledge in linguistic rhythm of the target language plays a major role in foreign language proficiency. This study attempts to discover valid rhythm features that can be utilized in automatic assessment of non-native English pronunciation. Eight previously proposed and two novel rhythm metrics are investigated with 360 English read speech tokens obtained from 27 Korean learners and 9 native speakers. It is found that some of the speech-rate normalized interval measures and above-word level metrics are effective enough to be further applied for automatic scoring as they are significantly correlated with speakers' proficiency levels. It is also shown that metrics need to be dynamically selected depending upon the structure of target sentences. Results from a preliminary auto-scoring experiment through a Multi Regression analysis suggest that appropriate control of unexpected input utterances is also desirable for better performance.

  • PDF

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

Comparing English and Korean speakers' word-final /rl/ clusters using dynamic time warping

  • Cho, Hyesun
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.29-36
    • /
    • 2022
  • The English word-final /rl/ cluster poses a particular problem for Korean learners of English because it is the sequence of two sounds, /r/ and /l/, which are not contrastive in Korean. This study compared the similarity distances between English and Korean speakers' /rl/ productions using the dynamic time warping (DTW) algorithm. The words with /rl/ (pearl, world) and without /rl/ (bird, word) were recorded by four English speakers and four Korean speakers, and compared pairwise. The F2-F1 trajectories, the acoustic correlate of velarized /l/, and F3 trajectories, the acoustic correlate of /r/, were examined. Formant analysis showed that English speakers lowered F2-F1 values toward the end of a word, unlike Korean speakers, suggesting the absence of /l/ in Korean speakers. In contrast, there was no significant difference in F3 values. Mixed-effects regression analyses of the DTW distances revealed that Korean speakers produced /r/ similarly to English speakers but failed to produce the velarized /l/ in /rl/ clusters.

The Use of Phonetics in the Analysis of the Acquisition of Second Language Syntax

  • Fellbaum, Marie
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.430-431
    • /
    • 1996
  • Among the scholars of second language (L2) acquisition who have used prosodic considerations in syntactic analyses, pausing and intonation contours have been used to define utterances in the speech of second language learners (e.g., Sato, 1990). In recent research on conversational analysis, it has been found that lexically marked causal clause combining in the discourse of native speakers can be distinguished as "intonational subordination" and "intonational coordination(Couper-Kuhlen, Elizabeth, forthcoming.)". This study uses Pienemann's Processability Theory (1995) for an analysis of the speech of native speakers of Japanese (L1) learning English. In order to accurately assess the psycholinguistic stages of syntactic development, it is shown that pitch, loudness, and timing must all be considered together with the syntactic analysis of interlanguage speech production. Twelve Japanese subjects participated in eight fifteen minute interviews, ninety-six dyads. The speech analyzed in this report is limited to the twelve subjects interacting with two different non-native speaker interviews for a total of twenty-four dyads. Within each of the interviews, four different tasks are analyzed to determine the stage of acquisition of English for each subject. Initially the speech is segmented according to intonation contour arid pauses. It is then classified accoding to specific syntactic units and further analysed for pitch, loudness and timing. Results indicate that the speech must be first claasified prosodic ally and lexically, prior to beginning syntactic analysis. This analysis stinguishes three interlanguage lexical categories: discourse markers, coordinator $s_ordinators, and transfer from Japanese. After these lexical categories have been determined, the psycholinguistic stages of syntactic development can be more accurately assessed.d.

  • PDF

The Aquisition and Description of Voiceless Stops of Spanish and English

  • Marie Fellbaum
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.274-274
    • /
    • 1996
  • This presents the preliminary results from work in progress of a paired study of the acquisition of voiceless stops by Spanish speakers learning English, and American English speakers learning Spanish. For this study the hypothesis was that the American speakers would have no difficulty suppressing the aspiration in Spanish unaspirated stops; the Spanish speakers would have difficulty acquiring the aspiration necessary for English voiceless stops, according to Eckman's Markedness Differential Hypothesis. The null hypothesis was proved. All subjects were given the same set of disyllabic real words of English and Spanish in carrier phrases. The tokens analyzed in this report are limited to word-initial voiceless stops, followed by a low back vowel in stressed syllables. Tokens were randomized and then arranged in a list with the words appearing three separate times. Aspiration was measured from the burst to the onset of voicing(VOT). Both the first language (Ll) tokens and second language (L2) tokens were compared for each speaker and between the two groups of language speakers. Results indicate that the Spanish speakers, as a group, were able to reach the accepted target language VOT of English, but English speakers were not able to reach the accepted range for Spanish, in spite of statistically significant changes of p<.OOl by speakers in both groups of learners. A closer analysis of the speech samples revealed wide variability within the speech of native speakers of English. Not only is variability in English due to the wide range of VOT (120 msecs. for English labials, for example) but individual speakers showed different patterns. These results are revealing for the demands requied in experimental designs and the number of speakers and tokens requied for an adequate description of different languages. In addition, a simple report of means will not distinguish the speakers and the respective language learning situation; measurements must also include the RANGE of acceptability of VOT for phonetic segments. This has immediate consequences for the learning and teaching of foreign languages involving aspirated stops. In addition, the labelling of spoken language in speech technology is shown to be inadequate without a fuller mathematical description.

  • PDF