• Title/Summary/Keyword: read speech

Search Result 163, Processing Time 0.023 seconds

The realization of English rhythm by Busan Korean speakers

  • Choe, Wook Kyung
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.81-87
    • /
    • 2019
  • The purpose of the current study is to investigate the realization of speech rhythm in English as spoken by Korean learners of English. The study particularly aims to examine the rhythm metrics of English read speech by learners who speak Busan or the South Kyungsang dialect of Korean. Twenty-four learners whose L1 is Busan Korean and eight native speakers of English read a passage wherein five sentences were segmented and labeled as vocalic and intervocalic intervals. Various rhythm metrics such as %V, Varcos, and Pairwise Variability Indexes (PVIs) were calculated. The results show that Korean learners read English sentences with significantly more vocalic and consonantal intervals at a slower speech rate than native English speakers. The analyses of rhythm metrics revealed that when the speech rate was not normalized, Korean learners' English showed more variability in the length of consonantal and vocalic intervals. However, speech-rate-normalized rhythm metrics for vocalic intervals indicated that Korean learners transferred their L1 rhythmic structures (a syllable-timed language) into their L2 speech (a stress-timed language). Overall, the results suggest that Korean learners' English reflects the rhythmic characteristics of their L1. The effect of the learners' L1 dialect on the realization of L2 speech rhythm is also speculated.

The Acoustic Analysis of Korean Read Speech - with respect to the prosodic phrasing - (한국어 낭독체 문장의 음향분석 -바람과 햇님의 운율구 생성을 중심으로-)

  • Sung Chuljae
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.157-172
    • /
    • 1996
  • This study aims to suggest some theoretical methodology for analysis of the prosodic patterns in Korean Read Speech. The engineering effort relevant to the phonetic study has focused to the importance of prosodic phrasing which may play a major role in analyzing the phonetic DB. Before establishing the prosodic phrase as the prosodic unit, we should describe the features of the boundary signal in a target sentence. With this in mind, the general characteristics of Read Speech and the ToBI(tones and Break Indices), which has been currently in vogue with respect to the prosodic labelling system were presented as the first step. The concrete analysis was carried out with the fable 'North Wind and the Sun' Korean version, where about 25 prosodic units were discriminated by perceptual approach for 5 subjects. Establishing various informations which can be used for deciding a boundary position systematically, we can proceed to the next, viz. acoustic analysis of prosodic unit. The most important which we primarily study for improving the naturalness of synthetic speech may be, at first, detecting the boundary signals in the speech file and accordingly reestablishment it within the raw text.

  • PDF

한국어 낭독체 발화의 경계 인식에 있어서 묵음 휴지(Silent pause)의 역할

  • Jo, Hyeong-Sil
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.117-119
    • /
    • 2006
  • This paper discusses the importance of silent pauses in the perception of prosodic boundaries in Korean speech. It is suggested that in speech in general, and in particular in spontaneous speech, silent pauses are neither necessary nor sufficient for the perception of prosodic boundaries. In read speech, however, there is a high correlation between the presence of a pause and the perception of a boundary. An experiment was carried out to determine whether removing the silent boundary from an extract of speech had a significant effect on the perception of boundaries in Korean read speech. Results suggest that while the presence of a silent boundary slightly reinforces the perception of a prosodic boundary, subjects are in general capable of perceiving the boundary without the silent pause.

  • PDF

A Comparative Study on the Male and Female Vowel Formants of the Korean Corpus of Spontaneous Speech (한국어 자연발화 음성코퍼스의 남녀 모음 포먼트 비교 연구)

  • Yoon, Kyuchul;Kim, Soonok
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.131-138
    • /
    • 2015
  • The aim of this work is to compare the vowel formants of the ten adult female speakers in their twenties and thirties from the Seoul corpus[7] with those of corresponding Korean male speakers from the same corpus and of American female speakers from the Buckeye corpus[4]. In addition, various linguistic factors that are expected affect the formant frequencies were examined to account for the distribution of the vowel formants. Formant frequencies extracted from the Seoul corpus were also compared to those from read speech. The results showed that the formant distribution of the spontaneous speech was very different from that of the read speech, while the comparison between the female and male speakers was similar in both languages. To a greater or lesser degree, the potential linguistic factors influenced the formant frequencies of the vowels.

Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease (명료발화와 보통발화에서 파킨슨병환자 음성의 켑스트럼 및 스펙트럼 분석)

  • Shin, Hee-Baek;Shim, Hee-Jeong;Jung, Hun;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.77-84
    • /
    • 2018
  • The purpose of this study is to examine the acoustic characteristics of Parkinsonian speech, with respect to different utterance conditions, by employing acoustic/auditory-perceptual analysis. The subjects of the study were 15 patients (M=7, F=8) with Parkinson's disease who were asked to read out sentences under different utterance conditions (clear/casual). The sentences read out by each subject were recorded, and the recorded speech was subjected to cepstrum and spectrum analysis using Analysis of Dysphonia in Speech and Voice (ADSV). Additionally, auditory-perceptual evaluation of the recorded speech was conducted with respect to breathiness and loudness. Results indicate that in the case of clear speech, there was a statistically significant increase in the cepstral peak prominence (CPP), and a decrease in the L/H ratio SD (ratio of low to high frequency spectral energy SD) and CPP F0 SD values. In the auditory-perceptual evaluation, a decrease in breathiness and an increase in loudness were noted. Furthermore, CPP was found to be highly correlated to breathiness and loudness. This provides objective evidence of the immediate usefulness of clear speech intervention in improving the voice quality of Parkinsonian speech.

The Noise Effect on Stuttering and Overall Speech Rate: Multi-talker Babble Noise (다화자잡음이 말더듬의 비율과 말속도에 미치는 영향)

  • Park, Jin;Chung, In-Kie
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.121-126
    • /
    • 2012
  • This study deals with how stuttering changes in its frequency in a situation where adult participants who stutter are exposed to one type of background noise, that is, multi-talker babble noise. Eight American English-speaking adults who stutter participated in this study. Each of the subjects read aloud sentences under each of three speaking conditions (i.e., typical solo reading (TSR), typical choral reading (TCR), and multi-talker babble noise reading (BNR)). Speech fluency was computed based on a percentage of syllables stuttered (%SS) and speaking rate was also assessed to examine if there was significant change in rates as a measure of vocal change under each of the speaking conditions. The study found that participants read more fluently both during BNR and during TCR than during TSR. The study also found that participants did not show significant changes in speaking rate across the three speaking conditions. Some discussion was provided in relation to the effect of multi-talker babble noise on the frequency of stuttering and its further speculation.

Prediction of Break Indices in Korean Read Speech (국어 낭독체 발화의 운율경계 예측)

  • Kim Hyo Sook;Kim Chung Won;Kim Sun Ju;Kim Seoncheol;Kim Sam Jin;Kwon Chul Hong
    • MALSORI
    • /
    • no.43
    • /
    • pp.1-9
    • /
    • 2002
  • This study aims to model Korean prosodic phrasing using CART(classification and regression tree) method. Our data are limited to Korean read speech. We used 400 sentences made up of editorials, essays, novels and news scripts. Professional radio actress read 400sentences for about two hours. We used K-ToBI transcription system. For technical reason, original break indices 1,2 are merged into AP. Differ from original K-ToBI, we have three break index Zero, AP and IP. Linguistic information selected for this study is as follows: the number of syllables in ‘Eojeol’, the location of ‘Eojeol’ in sentence and part-of-speech(POS) of adjacent ‘Eojeol’s. We trained CART tree using above information as variables. Average accuracy of predicting NonIP(Zero and AP) and IP was 90.4% in training data and 88.5% in test data. Average prediction accuracy of Zero and AP was 79.7% in training data and 78.7% in test data.

  • PDF

Automatic proficiency assessment of Korean speech read aloud by non-natives using bidirectional LSTM-based speech recognition

  • Oh, Yoo Rhee;Park, Kiyoung;Jeon, Hyung-Bae;Park, Jeon Gue
    • ETRI Journal
    • /
    • v.42 no.5
    • /
    • pp.761-772
    • /
    • 2020
  • This paper presents an automatic proficiency assessment method for a non-native Korean read utterance using bidirectional long short-term memory (BLSTM)-based acoustic models (AMs) and speech data augmentation techniques. Specifically, the proposed method considers two scenarios, with and without prompted text. The proposed method with the prompted text performs (a) a speech feature extraction step, (b) a forced-alignment step using a native AM and non-native AM, and (c) a linear regression-based proficiency scoring step for the five proficiency scores. Meanwhile, the proposed method without the prompted text additionally performs Korean speech recognition and a subword un-segmentation for the missing text. The experimental results indicate that the proposed method with prompted text improves the performance for all scores when compared to a method employing conventional AMs. In addition, the proposed method without the prompted text has a fluency score performance comparable to that of the method with prompted text.

An acoustic and perceptual investigation of the vowel length contrast in Korean

  • Lee, Goun;Shin, Dong-Jin
    • Phonetics and Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.37-44
    • /
    • 2016
  • The goal of the current study is to investigate how the sound change is reflected in production or in perception, and what the effect of lexical frequency is on the loss of sound contrasts. Specifically, the current study examined whether the vowel length contrasts are retained in Korean speakers' productions, and whether Korean listeners can distinguish vowel length minimal pairs in their perception. Two production experiments and two perception experiments investigated this. For production tests, twelve Korean native speakers in their 20s and 40s completed a read-aloud task as well as a map-task. The results showed that, regardless of their age group, all Korean speakers produced vowel length contrasts with a small but significant differences in the read-aloud test. Interestingly, the difference between long and short vowels has disappeared in the map task, indicating that the speech mode affects producing vowel length contrasts. For perception tests, thirty-three Korean listeners completed a discrimination and a forced-choice identification test. The results showed that Korean listeners still have a perceptual sensitivity to distinguish lexical meaning of the vowel length minimal pair. We also found that the identification accuracy was affected by the word frequency, showing a higher identification accuracy in high- and mid- frequency words than low frequency words. Taken together, the current study demonstrated that the speech mode (read-aloud vs. spontaneous) affects the production of the sound undergoing a language change; and word frequency affects the sound change in speech perception.

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.