• Title/Summary/Keyword: Speech sound

Search Result 628, Processing Time 0.024 seconds

Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns (발생/소멸 패턴을 이용한 비정형 혼합 오디오의 주성분 검출)

  • Kim, Samuel
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.224-231
    • /
    • 2013
  • Defining the concept of prominent audio content as the most informative audio content from the users' perspective within a given unstructured audio segment, we propose a simple but robust intensity-based attack/release pattern features to detect the prominent audio content. We also propose a web-based annotation procedure to retrieve users' subjective perception and annotated 18 hours of video clips across various genres, such as cartoon, movie, news, etc. The experiments with a linear classification method whose models are trained for speech, music, and sound effect demonstrate promising - but varying across the genres of programs - results (e.g., 86.7% weighted accuracy for speech-oriented talk shows and 49.3% weighted accuracy for {action movies}).

A Study on the Natural Language Generation by Machine Translation (영한 기계번역의 자연어 생성 연구)

  • Hong Sung-Ryong
    • Journal of Digital Contents Society
    • /
    • v.6 no.1
    • /
    • pp.89-94
    • /
    • 2005
  • In machine translation the goal of natural language generation is to produce an target sentence transmitting the meaning of source sentence by using an parsing tree of source sentence and target expressions. It provides generator with linguistic structures, word mapping, part-of-speech, lexical information. The purpose of this study is to research the Korean Characteristics which could be used for the establishment of an algorism in speech recognition and composite sound. This is a part of realization for the plan of automatic machine translation. The stage of MT is divided into the level of morphemic, semantic analysis and syntactic construction.

  • PDF

Aspects of Chinese Korean learners' production of Korean aspiration at different prosodic boundaries (운율 층위에 따른 중국인학습자들의 한국어 유기음화 적용 양상)

  • Yune, Youngsook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.9-17
    • /
    • 2017
  • The aim of this study is to examine whether Chinese Korean learners (CKL) can correctly produce the aspiration in 'a lenis obstruents /k/, /t/, /p/, /ʧ/+/h/ sound' sequence at the lexical and post-lexical level. For this purpose 4 Korean native speakers (KNS), 10 advanced and 10 intermediate CKL participated in a production test. The material analyzed consisted of 10 Korean sentences in which aspiration can be applied at different prosodic boundaries (syllable, word, accentual phrase). The results showed that for KNS and CKL, the rate of application of aspiration was different according to prosodic boundaries. Aspiration was more frequently applied at the lexical level than at the post-lexical level and it was more frequent at the word boundary than at the accentual phrase boundary. For CKL, pronunciation errors were either non-application of aspiration or coda obstruent omission. In the case of non-application of aspiration, CKL produced the target syllable as an underling form and they did not transform it as a surface form. In the case of coda obstruent ommision, most of the errors were caused by the inherent complexity of phonological process.

Design and Implementation of a Navigation System for Visually Impaired Persons (시각장애인을 위한 네비게이션 시스템 설계 및 구현)

  • Jang, Su-Min;Hwang, Dong-Gyo;Kang, Soo;Kim, Eun-Ju;Park, Jun-Ho;Jang, Ki-Hun;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.38-47
    • /
    • 2012
  • In order to extend the activity range of visually impaired persons, we design and implement a navigation system that supports road information services and points of interest. The proposed navigation system consists of route creation modules and storage modules for visually impaired persons. In particular, the main interface of the navigation system are implemented using TTS(Text-to-Speech) program for sound and braille module that outputs braille with sense of touch. We also use google map APIs that can provide latest map information for the navigation system.

Interactive Game Designed for Early Child using Multimedia Interface : Physical Activities (멀티미디어 인터페이스 기술을 이용한 유아 대상의 체감형 게임 설계 : 신체 놀이 활동 중심)

  • Won, Hye-Min;Lee, Kyoung-Mi
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.3
    • /
    • pp.116-127
    • /
    • 2011
  • This paper proposes interactive game elements for children : contents, design, sound, gesture recognition, and speech recognition. Interactive games for early children must use the contents which reflect the educational needs and the design elements which are all bright, friendly, and simple to use. Also the games should consider the background music which is familiar with children and the narration which make easy to play the games. In gesture recognition and speech recognition, the interactive games must use gesture and voice data which hits to the age of the game user. Also, this paper introduces the development process for the interactive skipping game and applies the child-oriented contents, gestures, and voices to the game.

Korean Students' Repetition of English Sentences Under Noise and Speed Conditions (소음과 속도를 변화시킨 영어 문장 따라하기에 대한 연구)

  • Kim, Eun-Jee;Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.105-117
    • /
    • 2004
  • Recently, many scholars have emphasized the importance of English listening ability for smoother communication. Most audio materials, however, were recorded in a quiet sound-proof booth. Therefore, students who have spent so much time listening to the ideal audio materials are expected to have difficulty communicating with native speakers in the real life. In this study, we examined how well thirty three Korean university students and five native speakers will repeat the recorded English sentences under noise and speed conditions. The subjects' production was scored by listening to each recorded sentence and counting the number of words correctly produced and determined the percent ratios of correctly produced words to the total words in each sentence. Results showed that the student group correctly repeated around 65% of all the words in each sentence while the native speakers demonstrated almost perfect match. It seemed that the students had difficulty perceiving and repeating function words in various conditions. Also, high-proficiency student group outperformed the low-proficiency student group particularly in their repetition of function words. In addition, the student subjects' accuracy of repetition remarkably dropped when the normal sentences were both sped up and mixed with noise. Finally, it was observed that the Korean students' percent correct ratio fell down as the stimulus sentence became longer.

  • PDF

An Experimental Study on English Vowel Lengths as Produced by Korean College Students in Chungnam and Gyungnam Provinces (충남.경남지역 대학생들의 영어모음 발음길이에 대한 실험적 연구)

  • Park, Hee-Suk;Kim, Jung-Sook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.157-173
    • /
    • 2003
  • The purpose of this experimental study is to investigate and compare the. vowel lengths of English diphthongs and low vowels among native-English-speaking Americans with Korean college students from the Chungnam and Gyungnam provinces. Eight words and sixteen sentences were uttered five times by twenty five subjects from three groups; 1) Chungnam dialect speakers, 2) Gyungnam dialect speakers and 3) five native-English-speaking Americans. Acoustic features (duration) were measured from sound spectrograms made by the PC Quire. Results showed that the vowel lengths of English diphthongs and low vowels between native English speakers and Korean collegians of Chungnam and Gyungnam provinces were different. Comparing the average length of English diphthongs of Korean collegians with those of American natives, we can see that native English speakers tend to pronounce the English diphthongs shorter than Korean collegians do. However, native English speakers tend to pronounce the English low vowels longer than Korean collegians do. In this study we also tried to find out the differences of English diphthongs and low vowel lengths in relation to their utterance positions among American natives and Chungnam and Gyungnam dialect speakers. By the results of this experiment, we observed a lengthening effect in the three groups. However, in the pronunciation of American natives, a lengthening effect of English vowels was more clearly observed, especially in the pronunciation of English diphthongs.

  • PDF

Chinese KFL learners' production aspects of post-lexical phonological process in Korean - Focusing on the nasalization - (운율구 형성과정에서 나타나는 어휘부와 후어휘부 필수음운현상에 대한 중국인학습자들의 발화양상 -비음화를 중심으로-)

  • Yune, Youngsook
    • Phonetics and Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.53-62
    • /
    • 2016
  • In this study, we examined whether Chinese learners of Korean can correctly produce the phonological process on the lexical and post-lexical level. For this purpose 4 Korean native speakers and 10 advanced and 10 intermediate Chinese learners of Korean participated in the production test. The materials analyzed constituted 10 Korean sentences in which nasalization can be applied on the syllable boundary, word boundary(w-boundary) as well as accentual phrase boundary(AP-boundary). The results show that for Korean speakers, nasalization was applied 100% at all level whereas for Chinese speakers, the rate of application of nasalization is different according to prosodic constituents and Korean proficiency. Nasalization was more frequently applied at the lexical level than the post-lexical level, and it is more frequent in the w-boundary conditions than in the AP-boundary conditions. However, the rate of nasalization in the w-boundary is close to the lexical level. The pronunciation errors were committed either as non application of nasalization or coda obstruent ommission. In the case of non application of nasalization, Chinese learners of Korean produced the target syllables as underling forms, which were not transformed as surface forms. In addition, we can observe the ommission of coda obstruents in 'lenis obstruents+nasal sound' sequences. As a result, nasalization is blocked by this omission.

English Sounds to Japanese Ears

  • Yuichi Endo
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.47-58
    • /
    • 2000
  • For the learners of English as a foreign language, oral repetition of model sentences is an e essential practice to improve their listening and speaking abilities of English. Skill training of both speech perception and production is involved in this practice. This paper reports on an observation of production e$\pi$ors in such practice made by Japanese college students in my class. The teaching material used is intended for acquainting the learners with basic English rhythm and intonation p patterns. The students were required to repeat each sentence in a series of conversations after a model reading. Although the vocabulary and expressions were rather limited, I monitored different kinds of errors in their repetition. Putting aside intonation, their difficulties are classified into five types; 1. Omission of words or morphemes, 2. Addition of unnecessary words or morphemes, 3. Replacement of words, 4. Japanization of English sounds, 5. Wrong rhythm caused by improper stress assignment. Accurate listening, especially to weakly stressed syllables and to assimilated sounds, as has often been pointed out, is the most difficult part in perception for them. Japanese sound system interferes in production of English sounds. More often than not their knowledge of grammar or the context does not work at all to guess the words they are hearing

  • PDF

Effects of base token for stimuli manipulation on the perception of Korean stops among native and non-native listeners

  • Oh, Eunjin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.43-50
    • /
    • 2020
  • This study investigated whether listeners' perceptual patterns varied according to base token selected for stimuli manipulation. Voice onset time (VOT) and fundamental frequency (F0) values were orthogonally manipulated, each in seven steps, using naturally produced words that contained a lenis (/kan/) and an aspirated (/khan/) stop in Seoul Korean. Both native and non-native groups showed significantly higher numbers of aspirated responses for the stimuli constructed with /khan/, evidencing the use of minor cues left in the stimuli after manipulation. For the native group the use of the VOT and F0 cues in the stop categorization did not differ depending on whether the base token included the lenis or aspirated stop, indicating that the results of previous studies remain tenable that investigated the relative importance of the acoustic cues in the native listener perception of the Korean stop contrasts by using one base token for manipulating perceptual stimuli. For the non-native group, the use patterns of the F0 cue differed as a function of base token selected. Some findings indicated that listeners used alternative cues to identify the stop contrast when major cues sound ambiguous. The use of the manipulated VOT and F0 cues by the non-native group was not native-like, suggesting that non-native listeners may have perceived the minor cues as stable in the context of the manipulated cue combinations.