• Title/Summary/Keyword: Speech sound

Search Result 627, Processing Time 0.028 seconds

Vocal Tract Modeling with Unfixed Sectionlength Acoustic Tubes(USLAT) (비고정 구간 길이 음향 튜브를 이용한 성도 모델링)

  • Kim, Dong-Jun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.6
    • /
    • pp.1126-1130
    • /
    • 2010
  • Speech production can be viewed as a filtering operation in which a sound source excites a vocal tract filter. The vocal tract is modeled as a chain of cylinders of varying cross-sectional area in linear prediction acoustic tube modeling. In this modeling the most common implementation assumes equal length of tube sections. Therefore, to model complex vocal tract shapes, a large number of tube sections are needed. This paper proposes a new vocal tract model with unfixed sectionlengths, which uses the reduced lattice filter for modeling the vocal tract. This model transforms the lattice filter to reduced structure and the Burg algorithm to modified version. When the conventional and the proposed models are implemented with the same order of linear prediction analysis, the proposed model can produce more accurate results than the conventional one. To implement a system within similar accuracy level, it may be possible to reduce the stages of the lattice filter structure. The proposed model produces the more similar vocal tract shape than the conventional one.

The Smart Learning System for English Language Using Hangeul (한글을 이용한 스마트 영어 학습 시스템)

  • Kwon, Seung-tag;Kim, Yong-seok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.6
    • /
    • pp.1157-1163
    • /
    • 2015
  • In this paper, we developed a Web App that operates in a mobile device. Also, we designed and developed an electronic dictionary of English words and sentences are expressed by English pronunciation with hangeul. The database using English words, Hangeul code with pictures, vocabulary definitions, speech sound files, and many sentences are created in this system. We developed the English learning system using HTML5 and m-Bizmaker software tools.

Zero-Crossing-Based Source Direction Estimation Using a Cepstral Prefiltering Technique (영교차점과 켑스트럼 전처리 기술을 이용한 반향환경에서의 음원방향 추정)

  • Park, Yong-Jin;Lee, Soo-Yeon;Park, Hyung-Min
    • MALSORI
    • /
    • no.67
    • /
    • pp.121-133
    • /
    • 2008
  • To estimate directions of multi-sound sources, we consider an approach based on zero crossings which provided more robust results to diffuse noise than the conventional cross-correlation-based method [6][7]. In reverberant environments, the performance of source direction estimation can be improved by using signal components through direct paths from sources to microphones. Since a cepstral prefiltering technique [8] removes the effect of reverberation, we propose a source direction estimation method which can find out intervals of the direct-path components by comparing original and cepstral-prefiltered envelopes. Simulations demonstrate that the proposed method can improve the performance of source direction estimation in reverberant environments.

  • PDF

A Pronunciation Analysis on Korean Point-of-Interest Data (한국어 위치정보 데이터의 발음 분석)

  • Kim, Sun-Hee
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.91-94
    • /
    • 2007
  • This paper aims to analyze the pronunciation of Korean Point-of-Interest (POI) data, which consist of 224 sound files, from the phonological point of view, adapting the notion of prosodic word within the framework of Intonational Phonology. Each POI word is broken down into prosodic words, which are defined as the minimal sequence of segments which can be produced as one Accentual Phrase (AP). Then the pronunciation of the POI word considering its prosodic words are analyzed. The results show that: in most cases, a prosodic word is realized as one AP; that, in some cases, two prosodic words are pronounced as one AP: and that no cases are found where 3 prosodic words are realized as one AP.

  • PDF

An Acoustical Study on the Syllable Structures of Korean Numeric Sounds (국어 숫자음의 음절구조에 대한 음향적 분석)

  • Yang, Byung-Gon
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.170-172
    • /
    • 2007
  • The purpose of this study was to examine the syllable structures of ten Korean numeric sounds produced by ten subjects of the same age. Each sound was normalized and divided into onset, vowel, and coda sections. Then, acoustical measurements of each syllable were done to compare the ten sounds. Results showed that there was not much deviation from the grand average duration and intensity for the majority of the sounds except the two diphthongal sounds on which their boundary points varied among the speakers. Some syllable boundaries were quite obvious while others were ambiguous. There seemed some tradeoff among the syllable components depending on their acoustic features.

  • PDF

On the Classification of Voice Sound and the Recognition of Vowels for Korean Continuous Speech (한국어 연속음인식에 관한 연구(유성음 분류 및 단모음 인식 ))

  • 하판봉;이철희;방승찬;안수길
    • The Journal of the Acoustical Society of Korea
    • /
    • v.5 no.3
    • /
    • pp.28-35
    • /
    • 1986
  • 우리나라 음성의 유성음을 모음, 비음 및 유성화 자음으로 분류하는 알고리즘을 기술하였다. 먼 저 기존의 PITCH 검출 알고리즘에 의하여 음성을 유성음과 무성음으로 나눈 뒤, 단지 정규화된 1차 상 관계수, 영교차율, LOG 에너지 및 LPG 에너지의 골짜기 검출만을 이용하여, 유성음은 모음, 비음 및 유 성화자음으로 분류하고 무성음은 실제의 무성음과 묵음으로 분류하였다. 그리고 이렇게 분류된 모음에 대하여 단모음 인식을 행하였다. 단지 한 FRAME으로 모음을 대표하였기 때문에 메모리 크기와 인식 시간을 줄였다. 여기서 UP & DOWN 및 수정된 영교차율을 새로이 정의하여 적용한 결과 만족한 결과 를 얻을 수 있었다. LPC 매개변수 및 전력 스펙트럼도 단모음 인식의 FEATURE로 사용하였다. 그리고 각 FEATURE 의 성능을 비교하였다. 이들 FEATURE을 잘 조합하여 2단계 인식을 행한 결과 92%의 높은 인식율을 얻을 수 있었다.

  • PDF

A study on the voice command recognition at the motion control in the industrial robot (산업용 로보트의 동작제어 명령어의 인식에 관한 연구)

  • 이순요;권규식;김홍태
    • Journal of the Ergonomics Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.3-10
    • /
    • 1991
  • The teach pendant and keyboard have been used as an input device of control command in human-robot sustem. But, many problems occur in case that the usef is a novice. So, speech recognition system is required to communicate between a human and the robot. In this study, Korean voice commands, eitht robot commands, and ten digits based on the broad phonetic analysis are described. Applying broad phonetic analysis, phonemes of voice commands are divided into phoneme groups, such as plosive, fricative, affricative, nasal, and glide sound, having similar features. And then, the feature parameters and their ranges to detect phoneme groups are found by minimax method. Classification rules are consisted of combination of the feature parameters, such as zero corssing rate(ZCR), log engery(LE), up and down(UD), formant frequency, and their ranges. Voice commands were recognized by the classification rules. The recognition rate was over 90 percent in this experiment. Also, this experiment showed that the recognition rate about digits was better than that about robot commands.

  • PDF

ACOUSTIC CHARACTERISTICS OF KOREAN TRADITIONAL SINGING VOICE: A PRELIMINARY REPORT

  • Moon, Seung-Jae
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.367-371
    • /
    • 1996
  • Most Koreans agree that Korean traditional singing voice has a very peculiar sound comparing to Western singing voice. The goal of this paper is to investigate the acoustic characteristics of Korean traditional singing voice called 'Pansori' Materials are analyzed from 3male professional singers and 4 female professional singers. Their singing was compared with their own conversation and other non-singers' conversation. Long term average spectra indicated that all the singers showed a much less spectral tilt than non-singers. The phenomenon was prevailing for professional singers not only in their singing, but also in their conversation. This suggests that it is not the result of a temporary effort but it may involve a certain permanent change in their physiological configuration. (To assess this hypothesis, voice source should be looked at directly. Therefore, in further research, using Rothenberg mask (Rothenberg, 1973) is strongly recommended.) In addition to LTA, individual vowel formants will be studied later.

  • PDF

A Study of English Loanwords

  • Lee, Hae-Bong
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.365-365
    • /
    • 2000
  • English segments adopted into Korean can be divided into three types: Some English segments /$m, {\;}n, {\;}{\eta}, {\;}p^h, {\;}t^h, {\;}k^h$/ are adopted into the original sound [$m, {\;}n, {\;}{\eta}, {\;}p^h, {\;}t^h, {\;}k^h$] in Korean. Other segments /b, d, g/ appear in the voiceless stop form [p, t, k]. Generative Phonology explains the presence of the above English segments in Korean but it cannot explain why the English segments /$f, {\;}v, {\;}{\Theta}, {\;}{\breve{z}}, {\;}{\breve{c}}, {\;}{\breve{j}}$/ disappear during the adopting process. I present a set of universal constraints from the Optimality Theory proposed by Prince and Smolensky(l993) and I show how English segments differently adopted into Korean can be explained by these universal constraints such as Faith(feature). N oAffricateStop, Faith(nasal), NoNasalStop, Faith(voice), NoVoicedStop and the interaction of these constraints. I conclude that this Optimality Theory provides insights that better capture the nature of the phonological phenomena of English segments in Korean.

  • PDF

ENGLISH RESTRUCTURING AND A USE OF MUSIC IN TEACHING ENGLISH PRONUNCIATION

  • Kim, Key-Seop
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.117-134
    • /
    • 2000
  • Kim, Key-Seop(2000). English Restructuring and A Use of Music in Teaching English Pronunciation. JSEP 2000 voU This study has two-fold aims: one is to clarify the restructuring of English in utterance, and the other is to relate it to teaching English pronunciation for listening and speaking with a use of music and song by suggesting a model of 10-15 minute pronunciation class syllabus for every period in class. Generally, English utterances are restructured by stress-timed rhythm, irrespective of syntactic boundaries. So the rhythmic units are arranged in isochronous groups, of which the making is to attach clitic(s) to a host or head often leftwards and sometimes rightwards, which results in linking, contraction, reduction, sound change and rhythm adjustment in utterance, just as in music and song. With English restructuring focused on, a model of English pronunciation class syllabus is proposed to be put forward in class for every period of a lesson or unit. It tries to relate the focused factor(s) in pronunciation to the integrated, with teaching techniques and music made use of.

  • PDF