• Title/Summary/Keyword: spoken word

Search Result 112, Processing Time 0.034 seconds

Performance Evaluation of an Automatic Distance Speech Recognition System (원거리 음성명령어 인식시스템 설계)

  • Oh, Yoo-Rhee;Yoon, Jae-Sam;Park, Ji-Hoon;Kim, Min-A;Kim, Hong-Kook;Kong, Dong-Geon;Myung, Hyun;Bang, Seok-Won
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.303-304
    • /
    • 2007
  • In this paper, we implement an automatic distance speech recognition system for voiced-enabled services. We first construct a baseline automatic speech recognition (ASR) system, where acoustic models are trained from speech utterances spoken by using a cross-talking microphone. In order to improve the performance of the baseline ASR using distance speech, the acoustic models are adapted to adjust the spectral characteristics of speech according to different microphones and the environmental mismatches between cross-talking and distance speech. Next we develop a voice activity detection algorithm for distance speech. We compare the performance of the base-line system and the developed ASR system on a task of PBW (Phonetically Balanced Word) 452. As a result it is shown that the developed ASR system provides the average word error rate (WER) reduction of 30.6 % compared to the baseline ASR system.

  • PDF

Speech Coarticulation Database of Korean and English ($\cdot$ 영 동시조음 데이터베이스의 구축)

  • ;Stephen A. Dyer;Dwight D. Day
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.17-26
    • /
    • 1999
  • We present the first speech coarticulation database of Korean, English and Konglish/sup 3)/ named "SORIDA"/sup 4)/, which is designed to cover the maximum number of representations of coarticulation in these languages [1]. SORIDA features a compact database which is designed to contain a maximum number of triphones in a minimum number of prompts. SORIDA contains all consonantal triphones and vowel allophones in 682 Korean prompts of word length and in 717 English prompt words, spoken five times by speakers of balanced genders, dialects and ages. Korean prompts are synthesized lexicons which maximize their coarticulation variation disregarding any stress phenomena, while English prompts are natural words that fully reflect their stress effects with respect to the coarticulation variation. The prompts are designed differently because English phonology has stress while Korean does not. An intermediate language, Konglish has also been modeled by two Korean speakers reading 717 English prompt words. Recording was done in a controlled laboratory environment with an AKG Model C-100 microphone and a Fostex D-5 digital-audio-tape (DAT) recorder. The total recording time lasted four hours. SORIDA CD-ROM is available in one disk of 22.05 kHz sampling rate with a 16 bit sample size. SORIDA digital audio-tapes are available in four 124-minute-tapes of 48 kHz sampling rate. SORIDA′s list of phonetically-rich-words is also available in English and Korean.

  • PDF

Language Specific Variations of Domain-initial Strengthening and its Implications on the Phonology-Phonetics Interface: with Particular Reference to English and Hamkyeong Korean

  • Kim, Sung-A
    • Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.7-21
    • /
    • 2004
  • The present study aims to investigate domain-initial strengthening phenomenon, which refers to strengthening of articulatory gestures at the initial positions of prosodic domains. More specifically, this paper presents the result of an experimental study of initial syllables with onset consonants (initial-syllable vowels henceforth) of various prosodic domains in English and Hamkyeong Korean, a pitch accent dialect spoken in the northern part of North Korea. The durations of initial-syllable vowels are compared to those of second vowels in real-word tokens for both languages, controlling both stress and segmental environment. Hamkyeong Korean, like English, tuned out to strengthen the domain-initial consonants. With regard to vowel durations, no significant prosodic effect was found in English. On the other hand, Hamkyeong Korean showed significant differences between the durations of initial and non-initial vowels in the higher prosodic domains. The theoretical implications of the findings are as follows: The potentially universal phenomenon of initial strengthening is shown to be subject to language specific variations in its implementation. More importantly, the distinct phonetics- phonology model (Pierrehumbert & Beckman, 1998; Keating, 1990; Cohn, 1993) is better equipped to account for the facts in the present study.

  • PDF

Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition (타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법)

  • Kim, Min-A;Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Cho, Sung-Eui;Lee, Seong-Ro
    • MALSORI
    • /
    • no.65
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF

Phonetic investigation of epenthetic vowels produced by Korean learners of English

  • Shin, Dong-Jin;Iverson, Paul
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.17-26
    • /
    • 2014
  • The present study examined epenthetic vowels produced by Korean learners of English in read sentences, in terms of acoustic measures and extra-phonological factors. The results demonstrated three main findings. First, epenthetic vowels had relatively high F1 values and a wide range of F2 values. Most of the epenthetic vowels were inserted near Korean high central vowels, but some vowels were inserted near front vowels due to co-articulation with surrounding vowels. Second, vowel epenthesis was affected by the context. The results showed that the epenthesis was frequently seen with word junctions between obstruents (e.g., stops-fricatives). Third, Korean learners were not affected by English background and were very weakly affected by orthography. English experience, which is one of the extra-phonological factors, was not related to epenthesis production. However, orthography, the other extra-phonological factor, very weakly affected the amount of epenthesis production. Nine percent of all epenthesis production was affected by the English past-tense suffix '-ed'; approximately 70% of the participants were affected by this suffix. The findings of the present study contributed to understanding vowel epenthesis. First, the study revealed that the epenthetic vowels produced by Korean learners of English were close to the high central vowel, supporting previous studies that the epenthetic vowel is quite close to the shortest vowel. Second, the study examined the various phonetic environments of epenthetic vowels, revealing that vowel epenthesis occurred more frequently in a certain phonetic circumstance.

Prosodic Annotation in a Thai Text-to-speech System

  • Potisuk, Siripong
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.405-414
    • /
    • 2007
  • This paper describes a preliminary work on prosody modeling aspect of a text-to-speech system for Thai. Specifically, the model is designed to predict symbolic markers from text (i.e., prosodic phrase boundaries, accent, and intonation boundaries), and then using these markers to generate pitch, intensity, and durational patterns for the synthesis module of the system. In this paper, a novel method for annotating the prosodic structure of Thai sentences based on dependency representation of syntax is presented. The goal of the annotation process is to predict from text the rhythm of the input sentence when spoken according to its intended meaning. The encoding of the prosodic structure is established by minimizing speech disrhythmy while maintaining the congruency with syntax. That is, each word in the sentence is assigned a prosodic feature called strength dynamic which is based on the dependency representation of syntax. The strength dynamics assigned are then used to obtain rhythmic groupings in terms of a phonological unit called foot. Finally, the foot structure is used to predict the durational pattern of the input sentence. The aforementioned process has been tested on a set of ambiguous sentences, which represents various structural ambiguities involving five types of compounds in Thai.

  • PDF

Design of Model to Recognize Emotional States in a Speech

  • Kim Yi-Gon;Bae Young-Chul
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.1
    • /
    • pp.27-32
    • /
    • 2006
  • Verbal communication is the most commonly used mean of communication. A spoken word carries a lot of informations about speakers and their emotional states. In this paper we designed a model to recognize emotional states in a speech, a first phase of two phases in developing a toy machine that recognizes emotional states in a speech. We conducted an experiment to extract and analyse the emotional state of a speaker in relation with speech. To analyse the signal output we referred to three characteristics of sound as vector inputs and they are the followings: frequency, intensity, and period of tones. Also we made use of eight basic emotional parameters: surprise, anger, sadness, expectancy, acceptance, joy, hate, and fear which were portrayed by five selected students. In order to facilitate the differentiation of each spectrum features, we used the wavelet transform analysis. We applied ANFIS (Adaptive Neuro Fuzzy Inference System) in designing an emotion recognition model from a speech. In our findings, inference error was about 10%. The result of our experiment reveals that about 85% of the model applied is effective and reliable.

On Improving the Listening Ability of Middle School Students Using Verbotonal Method (Verbotonal 법을 이용한 중학생 영어 학습자의 듣기 능력 향상에 관한 연구)

  • Kim, Hyun-Gi;Kim, Ok-Jin;Kang, Sung-Kwan;Jeon, Byoung-Man
    • Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.21-29
    • /
    • 2007
  • The necessity for improving the English listening ability of Korean learners has been emphasized since the ultimate goal of English education converted to CLT(Communicative Language Teaching) in Korea. Verbotonal Approach as an auditory-based strategy has been proved to be effective substantially in maximizing the listening skill of spoken foreign language. The purpose of this study is to find out an efficient way of improving listening ability for Korean middle school students by employing OFH(Optimal Frequency of Hearing) using Tonality Word Sentence Test, before & after using Listen II Verbotonal training unit based on VTS(Verbotonal System). The results of the listening tests showed that the listening ability of the subjects increased by 16.7% on the words and by 5.5% on the sentences after using Listen II, compared with before using Listen II and that the improvement rate of listening ability on the level of words is much higher than that on the level of sentences. From the results, we can come to a conclusion that training the listening skill with words in mid-tonality and low-tonality based on OFH might give a great positive effect in improving listening ability for Korean learners of English.

  • PDF

Difference, not Differentiation: The Thingness of Language in Sun Yung Shin's Skirt Full of Black

  • Shin, Haerin
    • Journal of English Language & Literature
    • /
    • v.64 no.3
    • /
    • pp.329-345
    • /
    • 2018
  • Sun Yung Shin's poetry collection Skirt Full of Black (2007) brings the author's personal history as a Korean female adoptee to bear upon poetic language in daring formal experiments, instantiating the liminal state of being shuttled across borders to land in an in-between state of marginalization. Other Korean American poets have also drawn on the experience of transnational adoption and racialization explore the literary potential of English to materialize haunting memories or the untranslatable yet persistent echoes of a lost home that gestures across linguistic boundaries, as seen in the case of Lee Herrick or Jennifer Kwon Dobbs. Shin however dismantles the referential foundation of English as a language she was transplanted into through formal transgressions such as frazzled syntax, atypical typography, decontextualized punctuation marks, and phonetic and visual play. The power to signify and thereby differentiate one entity or meaning from another dissipates in the cacophonic feast of signs in Skirt Full of Black; the word fragments of identificatory markers that turn racialized, gendered, and culturally contained subjects into exotic things lose the power to define them as such, and instead become alterities by departing from the conventional meaning-making dynamics of language. Expanding on the avant-garde legacy of Korean American poets Theresa Hak Kyung Cha and Myung Mi Kim to delve further into the liminal space between Korean and American, referential and representational, or spoken and written words, Shin carves out a space for discreteness that does not subscribe to the hierarchical ontology of differential value assignment.

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.2
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.