• Title/Summary/Keyword: phoneme

Search Result 458, Processing Time 0.023 seconds

Korean Phonological Viseme for Lip Synch Based on Phoneme Recognition (음소인식 기반의 립싱크 구현을 위한 한국어 음운학적 Viseme의 제안)

  • Joo Heeyeol;Kang Sunmee;Ko Hanseok
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.70-73
    • /
    • 1999
  • 본 논문에서는 한국어에 대한 실시간 음소 인식을 통한 Lip Synch 구현에 필수요소인 Viseme(Visual Phoneme)을 한국어의 음운학적 접근 방법을 통해 제시하고, Lip Synch에서 입술의 모양에 결정적인 영향을 미치는 모음에 대한 모음 인식 실험 및 결과 분석을 한다.모음인식 실험에서는 한국어 음소 51개 각각에 대해 3개의 State로 이루어진 CHMM (Continilous Hidden Makov Model)으로 모델링하고, 각각의 음소가 병렬로 연결되어진 음소네트워크를 사용한다. 입력된 음성은 12차 MFCC로 특징을 추출하고, Viterbi 알고리즘을 인식 알고리즘으로 사용했으며, 인식과정에서 Bigrim 문법과 유사한 구조의 음소배열 규칙을 사용해서 인식률과 인식 속도를 향상시켰다.

  • PDF

A study on the voice command recognition at the motion control in the industrial robot (산업용 로보트의 동작제어 명령어의 인식에 관한 연구)

  • 이순요;권규식;김홍태
    • Journal of the Ergonomics Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.3-10
    • /
    • 1991
  • The teach pendant and keyboard have been used as an input device of control command in human-robot sustem. But, many problems occur in case that the usef is a novice. So, speech recognition system is required to communicate between a human and the robot. In this study, Korean voice commands, eitht robot commands, and ten digits based on the broad phonetic analysis are described. Applying broad phonetic analysis, phonemes of voice commands are divided into phoneme groups, such as plosive, fricative, affricative, nasal, and glide sound, having similar features. And then, the feature parameters and their ranges to detect phoneme groups are found by minimax method. Classification rules are consisted of combination of the feature parameters, such as zero corssing rate(ZCR), log engery(LE), up and down(UD), formant frequency, and their ranges. Voice commands were recognized by the classification rules. The recognition rate was over 90 percent in this experiment. Also, this experiment showed that the recognition rate about digits was better than that about robot commands.

  • PDF

The relationship between segmental production by Japanese learners of Korean and pronunciation evaluation (일본인 한국어 학습자의 분절음 실현과 발음 평가의 상관성)

  • Hong, Hyejin;Ryu, Hyuksu;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.101-108
    • /
    • 2014
  • This study investigates the effects of Japanese learners' Korean segmental production on pronunciation evaluation by Korean native raters. Read speech from 24 learners whose native language is Japanese are transcribed at the phonemic level, and confusion matrices are generated based on the phonemic transcriptions. The deviance from the canonical pronunciation found in the learners' speech is analyzed in terms of phoneme substitutions, vowel insertions, and consonant deletions. Each learner's pronunciation is rated impressionistically by 5 Korean native raters. The result shows that the deviance from the canonical pronunciation is strongly correlated with the pronunciation evaluation scores. Especially, the rates of phoneme substitutions and vowel insertions which are very strongly correlated with the pronunciation evaluation scores.

The Development of Phonological Awareness in Children (아동의 음운인식 발달)

  • Park, Hyang Ah
    • Korean Journal of Child Studies
    • /
    • v.21 no.1
    • /
    • pp.35-44
    • /
    • 2000
  • This study examined the development of phonological awareness of 3-, 5-, and 7-year-old children, 20 subjects at each age level. The 3-year-olds were given 2 phoneme detection tasks and the 5- and 7-year-olds were given 5 phoneme detection tasks. In each task, the children first heard a target syllable together with 2 other syllables and were asked to tell which of the 2 syllables sounded similar to the target. Children were able to detect relatively large segments ($Consonant_1+Vowel$ or $Vowel+Consonant_2$: $C_1V$ or $VC_2$) at the age of 3 and gradually progressed to smaller sound segments(e.g., phonemes). This study indicated the Korean children detect $C_1V$ segments better than $VC_2$ segments and detect the initial consonant better than the middle vowel and the final consonant.

  • PDF

Ambiguity Types of the Homonymic & Heterographic Units for Improving Korean Voice Recognition System - a Preliminary Research (한국어 음성인식 시스템 향상을 위한 동음이철 단위의 중의성 유형 분류)

  • Yoon, Ae-Sun;Kang, Mi-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.67-81
    • /
    • 2008
  • The accuracy rate of P2G (Phoneme-to-Grapheme) is one of the important factors determining the quality of unlimited voice recognition (VR) systems. Few studies were, however, conducted to reduce ambiguities of a phoneme string which can be segmented into a variety of different linguistic units (i.e. morphemes, words, eo-jeols), thus be transformed into more than one grapheme string. This paper is a preliminary research for building a large knowledge base of those homonymic & heterographic units(HHUs), which will provide unlimited Korean VR systems with more accurate P2G information. This paper analyzes 2 main factors generating HHUs: (1) boundary determination of the prosodic unit; (2) its segmentation into linguistic units. In this paper, linguistic characteristics determining variable boundaries of a prosodic unit are investigated, and the ambiguity types of HHUs are classified in accordance with their morphological and syntactic structures as well as with the phonological rules governing them.

  • PDF

Grapheme-to-Phoneme Conversion and Prosody Modeling for Korean Conversational Style TTS (한국어 대화체 TTS 개발을 위한 발음 및 운율 추정)

  • Lee, Jin-Sik;Kim, Seung-Won;Kim, Byeong-Chang;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.135-138
    • /
    • 2006
  • In this paper, we introduce a method for extracting grapheme-to-phoneme conversion rules from the transcription of speech synthesis database and a prosody modeling method using the light version of ToBI for a Korean conversational style TTS. We focused on representing the characteristics of the conversational speech style and the experimental results show that our proposed methods are suitable for developing a Korean conversional style TTS.

  • PDF

A Study on the Text-to-Speech Conversion Using the Formant Synthesis Method (포만트 합성방식을 이용한 문자-음성 변환에 관한 연구)

  • Choi, Jin-San;Kim, Yin-Nyun;See, Jeong-Wook;Bae, Geun-Sune
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.9-23
    • /
    • 1997
  • Through iterative analysis and synthesis experiments on Korean monosyllables, the Korean text-to-speech system was implemented using the phoneme-based formant synthesis method. Since the formants of initial and final consonants in this system showed many variations depending on the medial vowels, the database for each phoneme was made up of formants depending on the medial vowels as well as duration information of transition region. These techniques were needed to improve the intelligibility of synthetic speech. This paper investigates also methods of concatenating the synthesis units to improve the quality of synthetic speech.

  • PDF

Multi-stage Speech Recognition Using Confidence Vector (신뢰도 벡터 기반의 다단계 음성인식)

  • Jeon, Hyung-Bae;Hwang, Kyu-Woong;Chung, Hoon;Kim, Seung-Hi;Park, Jun;Lee, Yun-Keun
    • MALSORI
    • /
    • no.63
    • /
    • pp.113-124
    • /
    • 2007
  • In this paper, we propose a use of confidence vector as an intermediate input feature for multi-stage based speech recognition architecture to improve recognition accuracy. A multi-stage speech recognition structure is introduced as a method to reduce the computational complexity of the decoding procedure and then accomplish faster speech recognition. Conventional multi-stage speech recognition is usually composed of three stages, acoustic search, lexical search, and acoustic re-scoring. In this paper, we focus on improving the accuracy of the lexical decoding by introducing a confidence vector as an input feature instead of phoneme which was used typically. We take experimental results on 220K Korean Point-of-Interest (POI) domain and the experimental results show that the proposed method contributes on improving accuracy.

  • PDF

Teaching Pronunciation Using Sound Visualization Technology to EFL Learners

  • Min, Su-Jung;Pak, Hubert H.
    • English Language & Literature Teaching
    • /
    • v.13 no.2
    • /
    • pp.129-153
    • /
    • 2007
  • When English language teachers are deciding on their priorities for teaching pronunciation, it is imperative to know what kind of differences and errors are most likely to interfere with communication, and what special problems particular first-language speakers will have with English pronunciation. In other words, phoneme discrimination skill is an integral part of speech processing for the EFL learners' learning to converse in English. Training using sound visualization technique can be effective in improving second language learners' perceptions and productions of segmental and suprasegmental speech contrasts. This study assessed the efficacy of a pronunciation training that provided visual feedback for EFL learners acquiring pitch and durational contrasts to produce and perceive English phonemic distinctions. The subjects' ability to produce and to perceive novel English words was tested in two contexts before and after training; words in isolation and words in sentences. In comparison with an untrained control group, trainees showed improved perceptual and productive performance, transferred their knowledge to new contexts, and maintained their improvement three months after training. These findings support the feasibility of learner-centered programs using sound visualization technique for English language pronunciation instruction.

  • PDF

A Method of the Extraction of Phonemes in Hangeul Recognition (한글 인식에 있어서의 자소추출)

  • ;市川忠男, 藤田廣一
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.18 no.2
    • /
    • pp.36-43
    • /
    • 1981
  • This paper describes a met hod of the extraction of phonemes in Hangout recognition. We provide the direction of strokes aid positional information for analyzing the structure of characters based on the regular combinational rules of Hangout according to Top -Down processing, and show the process of Phoneme extraction seq uencially. In this paper, some processing algorithms are described and simulated. The experiment of the phoneme extraction is carried out for 677 characters actully used daily, and extraction rate of 96% is obtained. The experimental results demonstrate the effectiveness of the proposed method.

  • PDF