• Title/Summary/Keyword: Speech Corpus

Search Result 300, Processing Time 0.026 seconds

A study of flaps in American English based on the Buckeye Corpus (Buckeye corpus에 나타난 탄설음화 현상 분석)

  • Hwang, Byeonghoo;Kang, Seokhan
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.9-18
    • /
    • 2018
  • This paper presents an acoustic and phonological study of the alveolar flaps in American English. Based on the Buckeye Corpus, the flapping tokens produced by twenty men are analyzed at both lexical and post-lexical levels. The data, analyzed with Pratt speech analysis, include duration, F2 and F3 in voicing during the flap, as well as duration, F1, F2, F3, and f0 in the adjacent vowels. The results provide evidence on two issues: (1) The different ways in which voiced and voiceless alveolar stops give rise to neutralized flapping stops by following lexical and post-lexical levels, (2) The extent to which the vowel features (height, frontness, and tenseness) affect flapping sounds. The results show that flaps are affected by pre-consonantal vowel features at the lexical as well as post-lexical levels. Unlike previous studies, this study uses the Praat method to distinguish flapped from unflapped tokens in the Buckeye Corpus and examines connections between the lexical and post-lexical levels.

A study on the release burst spectra of the voiceless plosives from the English and Korean spontaneous speech corpus (영어와 한국어 자연발화 코퍼스에서의 무성 폐쇄음 개방 파열 스펙트럼 연구)

  • Hwang, Sunmi;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.27-34
    • /
    • 2017
  • The purpose of this work is to examine the English and Korean voiceless plosives from the Buckeye[15] and Seoul[16] corpus in terms of their static spectral characteristics. The plosives were automatically extracted by a Praat script. In order to estimate the percent correctness in the classification of the plosives, discriminant analyses were performed whose trainings were based on four spectral moments, i.e. the center of gravity, variance, skewness and kurtosis as suggested in [6]. Another set of discriminant analyses were performed based on the spectral tilts. In the last set of analyeses, the spectral moments and tilts were both used in the training. Results showed that the correct classification rate did not exceed around 65% in the best case, which suggested that phonetic cues other than the release burst would be necessary including the dynamic spectral aspects and vowel-onset cues.

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

Generating a Category Set of Words Using a Hierarchical Part-of-speech System and Tagged Corpus

  • Kojima, Takeyuki;Kotani, Yoshiyuki
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.217-226
    • /
    • 2002
  • In this paper, we propose a method of generating a proper categorization of morphemes by giving a hierarchical part-of-speech system and a corpus tagged using this part-of-speech system. Our method use hierarchical information in the part-of-speech system and statistical information in the corpus to generate a category set. The statistical information is based on the context of occurrence of categories. First, we specify the format of given information. Then, we describe an algorithm to generate a proper categorization. Finally, we present the results of our experiments in applying this method. We obtained a moderately proper categorization and found several candidates for improvement .

  • PDF

N- gram Adaptation Using Information Retrieval and Dynamic Interpolation Coefficient (정보검색 기법과 동적 보간 계수를 이용한 N-gram 언어모델의 적응)

  • Choi Joon Ki;Oh Yung-Hwan
    • MALSORI
    • /
    • no.56
    • /
    • pp.207-223
    • /
    • 2005
  • The goal of language model adaptation is to improve the background language model with a relatively small adaptation corpus. This study presents a language model adaptation technique where additional text data for the adaptation do not exist. We propose the information retrieval (IR) technique with N-gram language modeling to collect the adaptation corpus from baseline text data. We also propose to use a dynamic language model interpolation coefficient to combine the background language model and the adapted language model. The interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech data reserved for held-out validation data. This allows the final adapted model to improve the performance of the background model consistently The proposed approach reduces the word error rate by $13.6\%$ relative to baseline 4-gram for two-hour broadcast news speech recognition.

  • PDF

Speech Corpus for Korean as a Foreign Language and the Aspects of the Foreign Learners' Acquisition of the Phonetic and Phonological Systems in the Korean Language (외국어로서의 한국어 음성 코퍼스 구축과 이를 통한 외국인의 한국어 음성${\cdot}$음운체계 습득 양상 연구)

  • Rhee, Seok-Chae;Kim, Jeong-Ah;Chang, Chae-Woong
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.29-33
    • /
    • 2005
  • This study aims to establish a speech corpus for Korean as a foreign language (L2 Korean Speech Corpus, L2KSC) and to examine the aspects of the foreign learners acquisition of the phonetic and phonological systems in the Korean Language. In the first year of this project, L2KSC will be established through the process of reading list organizing, recording, and slicing, and the second year includes an in-depth study of the aspects of foreign learners Korean acquisition and a contrastive analysis of phonetic and phonological systems. The expectation is that this project will provide significant bases for a variety of fields such as Korean education, academic research, and technological development of phonetic information.

  • PDF

Implementation and Evaluation of an HMM-Based Speech Synthesis System for the Tagalog Language

  • Mesa, Quennie Joy;Kim, Kyung-Tae;Kim, Jong-Jin
    • MALSORI
    • /
    • v.68
    • /
    • pp.49-63
    • /
    • 2008
  • This paper describes the development and assessment of a hidden Markov model (HMM) based Tagalog speech synthesis system, where Tagalog is the most widely spoken indigenous language of the Philippines. Several aspects of the design process are discussed here. In order to build the synthesizer a speech database is recorded and phonetically segmented. The constructed speech corpus contains approximately 89 minutes of Tagalog speech organized in 596 spoken utterances. Furthermore, contextual information is determined. The quality of the synthesized speech is assessed by subjective tests employing 25 native Tagalog speakers as respondents. Experimental results show that the new system is able to obtain a 3.29 MOS which indicates that the developed system is able to produce highly intelligible neutral Tagalog speech with stable quality even when a small amount of speech data is used for HMM training.

  • PDF

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.117-124
    • /
    • 2015
  • This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.

Current States and Future Plans at SiTEC for Speech Corpora for Common Use (SiTEC의 공동 이용을 위한 음성 코퍼스 구축 현황 및 계획)

  • Kim Bong-Wan;Choi Dae-Lim;Kim Young-Il;Lee Kwang-Hyun;Lee Yong-Ju
    • MALSORI
    • /
    • no.46
    • /
    • pp.175-185
    • /
    • 2003
  • To support speech information technology industry it is vital to create and distribute standardized speech corpora to be used for the development of products and technologies. In this article we introduce speech corpora created by Speech Information Technology & Industry Promotion Center(SiTEC) during its 1st and 2nd fiscal years (2001/5/1-2003/4/30) and plans for those corpora which is being created currently or will be created in near future. We introduce the corpus for car application to expand speech information technology to the field of traditional industry, the corpora for foreign languages to support exportation, the corpus for basic research for the sake of application in the industry, the corpora for common use, and others.

  • PDF

The Optimal and Complete Prompts Lists Generation Algorithm for Connected Spoken Word Speech Corpus (연결 단어 음성 인식기 학습용 음성DB 녹음을 위한 최적의 대본 작성 알고리즘)

  • 유하진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.187-191
    • /
    • 2004
  • This paper describes an efficient algorithm to generate compact and complete prompts lists for connected spoken words speech corpus. In building a connected spoken digit recognizer, we have to acquire speech data in various contexts. However, in many speech databases the lists are made by using random generators. We provide an efficient algorithm that can generate compact and complete lists of digits in various contexts. This paper includes the proof of optimality and completeness of the algorithm.