• Title/Summary/Keyword: Corpus phonetics

Search Result 79, Processing Time 0.034 seconds

Design and Construction of Korean-Spoken English Corpus(K-SEC) (한국인의 영어 음성 코퍼스 설계 및 구축)

  • Rhee Seok-Chae;Lee Sook-Hyang;Kang Seok-keun;Lee Yong-Ju
    • MALSORI
    • /
    • no.46
    • /
    • pp.159-174
    • /
    • 2003
  • K-SEC (Korean-Spoken English Corpus) is a kind of speech database that is being under construction by the authors of this paper This article discusses the needs of the K-SEC from various academic disciplines and industrial circles, and it introduces the characteristics of the K-SEC design, its catalogues and contents of the recorded database, exemplifying what are being considered from both Korean and English languages' phonetics and phonologies. The K-SEC can be marked as a beginning of a parallel speech corpus, and it is suggested that a similar corpus should be enlarged for the future advancements of the experimental phonetics and the speech information technology.

  • PDF

Design and Construction of Korean-Spoken English Corpus (K-SEC) (한국인의 영어 음성코퍼스 설계 및 구축)

  • Rhee Seok-Chae;Lee Sook-Hyang;Kang Seok-keun;Lee Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.12-20
    • /
    • 2003
  • K-SEC(Korean-Spoken English Corpus) is a kind of speech database that is being under construction by the authors of this paper. This article discusses the needs of the K-SEC from various academic disciplines and industrial circles, and it introduces the characteristics of the K-SEC design, its catalogues and contents of the recorded database, exemplifying what are being considered from both Korean and English languages' phonetics and phonologies. The K-SEC can be marked as a beginning of a parallel speech corpus, and it is suggested that a similar corpus should be enlarged for the future advancements of the experimental phonetics and the speech information technology.

  • PDF

A Study on the Male Vowel Formants of the Korean Corpus of Spontaneous Speech (한국어 자연발화 음성코퍼스의 남성 모음 포먼트 연구)

  • Kim, Soonok;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.95-102
    • /
    • 2015
  • The purpose of this paper is to extract the vowel formants of the ten adult male speakers in their twenties and thirties from the Korean Corpus of Spontaneous Speech [4], also known as the Seoul corpus, and to analyze them by comparing to earlier works on the Buckeye Corpus of Conversational Speech [1] in terms of the various linguistic factors that are expected to affect the formant distribution. The vowels extracted from the Korean corpus were also compared to those of the read Korean. The results showed that the distribution of the vowel formants from the Korean corpus was very different from that of read Korean speech. The comparison with English corpus and read English speech showed similar patterns. The factors affecting the Korean vowel formants were the interviewer sex, the location of the target vowel or the syllable containing it with respect to the phrasal word or utterance and the speech rate of the surrounding words.

A study on the voice onset times of the Seoul Corpus males in their twenties (서울 코퍼스 20대 남성의 성대진동 개시시간 연구)

  • Lee, Yuri;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.1-8
    • /
    • 2016
  • The purpose of this work is to examine the voice onset times (VOTs) of the three types of plosives from the Seoul Corpus male speakers in their twenties. In addition, the factors known to affect VOTs were analyzed, including the place and manner of articulation, speakers, location in words, type of following vowels and speech rates calculated from the three consecutive words. Much of the findings agreed with those from earlier studies on Korean and other languages and new discoveries were made.

A Comparative Study on the Male and Female Vowel Formants of the Korean Corpus of Spontaneous Speech (한국어 자연발화 음성코퍼스의 남녀 모음 포먼트 비교 연구)

  • Yoon, Kyuchul;Kim, Soonok
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.131-138
    • /
    • 2015
  • The aim of this work is to compare the vowel formants of the ten adult female speakers in their twenties and thirties from the Seoul corpus[7] with those of corresponding Korean male speakers from the same corpus and of American female speakers from the Buckeye corpus[4]. In addition, various linguistic factors that are expected affect the formant frequencies were examined to account for the distribution of the vowel formants. Formant frequencies extracted from the Seoul corpus were also compared to those from read speech. The results showed that the formant distribution of the spontaneous speech was very different from that of the read speech, while the comparison between the female and male speakers was similar in both languages. To a greater or lesser degree, the potential linguistic factors influenced the formant frequencies of the vowels.

Growth curve modeling of nucleus F0 on Korean accentual phrase

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.17-23
    • /
    • 2017
  • The present study investigates the effect of Accentual Phrase on F0 using a subset of large-scale corpus of Seoul Korean. Four syllable words which were neither preceded nor followed by silent pauses were presumed to be canonical exemplars of Accentual Phrases in Korean. These four syllable words were extracted from female speakers' speech samples. Growth curve analyses, combination of regression and polynomial curve fitting, were applied to the four syllable words. Four syllable words were divided into four groups depending on the categorical status of the initial segment: voiceless obstruents, voiced obstruents, sonorants, and vowels. Results of growth curve analyses indicate that initial segment types have an effect on the F0 (in semitone) in the nucleus of the initial syllable, and the cubic polynomial term revealed that some of the medial low tones in the 4 syllable words may be guided by the principle of contrast maximization, while others may be governed by the principle of ease of articulation.

Corpus-based evaluation of French text normalization (코퍼스 기반 프랑스어 텍스트 정규화 평가)

  • Kim, Sunhee
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.31-39
    • /
    • 2018
  • This paper aims to present a taxonomy of non-standard words (NSW) for developing a French text normalization system and to propose a method for evaluating this system based on a corpus. The proposed taxonomy of French NSWs consists of 13 categories, including 2 types of letter-based categories and 9 types of number-based categories. In order to evaluate the text normalization system, a representative test set including NSWs from various text domains, such as news, literature, non-fiction, social-networking services (SNSs), and transcriptions, is constructed, and an evaluation equation is proposed reflecting the distribution of the NSW categories of the target domain to which the system is applied. The error rate of the test set is 1.64%, while the error rate of the whole corpus is 2.08%, reflecting the NSW distribution in the corpus. The results show that the literature and SNS domains are assessed as having higher error rates compared to the test set.

Monophthong Analysis on a Large-scale Speech Corpus of Read-Style Korean (한국어 대용량발화말뭉치의 단모음분석)

  • Yoon, Tae-Jin;Kang, Yoonjung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.139-145
    • /
    • 2014
  • The paper describes methods of conducting vowel analysis from a large-scale corpus with the aids of forced alignment and optimal formant ceiling methods. 'Read Style Corpus of Standard Korean' is used for building the forced alignment system and a subset of the corpus for the processing and extraction of features for vowel analysis based on optimal formant ceiling. The results of the vowel analysis are reliable and comparable to the results obtained using traditional analytical methods. The findings indicate that the methods adopted for the analysis can be extended and be used for more fine-grained analysis without time-consuming manual labeling without losing accuracy and reliability.

Error Correction and Praat Script Tools for the Buckeye Corpus of Conversational Speech (벅아이 코퍼스 오류 수정과 코퍼스 활용을 위한 프랏 스크립트 툴)

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.29-47
    • /
    • 2012
  • The purpose of this paper is to show how to convert the label files of the Buckeye Corpus of Spontaneous Speech [1] into Praat format and to introduce some of the Praat scripts that will enable linguists to study various aspects of spoken American English present in the corpus. During the conversion process, several types of errors were identified and corrected either manually or automatically by the use of scripts. The Praat script tools that have been developed can help extract from the corpus massive amounts of phonetic measures such as the VOT of plosives, the formants of vowels, word frequency information and speech rates that span several consecutive words. The script tools can extract additional information concerning the phonetic environment of the target words or allophones.

A Study on the Voice Onset Time of English Voiceless Stops in the Buckeye Corpus (벅아이 코퍼스를 이용한 영어 무성파열음의 VOT 연구)

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.33-40
    • /
    • 2012
  • The purpose of this paper is to investigate the voice onset time (VOT) of the English voiceless stops [p, t, k] found in the Buckeye Corpus of Conversational Speech [1]. Three young female speakers were chosen for this study and their VOT values were semi-automatically extracted along with other factors. The factors used for the analysis were place of articulation, location in word, syllabic stress, content word or not, word frequency calculated from the corpus, and the speech rate expressed in syllables per second. Results showed that, for the three places of articulation of each speaker, all the factors had a statistically significant effect on the VOT values. This paper has significance in that the materials used for the analysis were from a corpus of spontaneous natural English speech.