• Title/Summary/Keyword: vowel system

Search Result 142, Processing Time 0.028 seconds

An Implementation of Unlimited Speech Recognition and Synthesis System using Transcription of Roman to Hangul (영한 음차 변환을 이용한 무제한 음성인식 및 합성기의 구현)

  • 양원렬;윤재선;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.08a
    • /
    • pp.181-184
    • /
    • 2000
  • 본 논문에서는 영한 음차 변환을 이용한 음성인식 및 합성기를 구현하였다. 음성인식의 경우 CV(Consonant Vowel), VCCV, VCV, VV, VC 단위를 사용하였다. 위의 단위별로 미리 구축된 모델을 결합함으로써 무제한 음성인식 시스템을 구축하였다. 따라서 영한 음차 변환을 이용하게 되면 인식 대상이 영어단어일 경우에도 이를 한글 발음으로 변환한 후 그에 해당하는 모델을 생성함으로써 인식이 가능하다. 음성 합성기의 경우 합성에 필요한 한국어 음성 데이터 베이스를 구축하고, 입력되는 텍스트에 따라 이를 연결하여 합성음을 생성한다. 영어가 입력될 경우 영한 음차 변환을 이용하여 입력된 영어발음을 한글로 바꾸어 준 후 입력하게 되므로 별도의 영어 합성기 없이도 합성음을 생성할 수 있다.

  • PDF

Variational autoencoder for prosody-based speaker recognition

  • Starlet Ben Alex;Leena Mary
    • ETRI Journal
    • /
    • v.45 no.4
    • /
    • pp.678-689
    • /
    • 2023
  • This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F0), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.

A Study on Construction and Implementation of Web education System with Chinese conversion rule set (중국어 규칙변환 웹 교육시스템 설계 및 구현에 관한 연구)

  • Lee, Ji Hyun;Lee, Eun Ryoung
    • Journal of Digital Contents Society
    • /
    • v.17 no.4
    • /
    • pp.227-234
    • /
    • 2016
  • When Chinese character used in Korea, so did the characters' pronunciation, so many Korean Chinese characters today have similar pronunciation with Chinese, but since Korean and Chinese pronunciations were preserved and developed in different alphabets, the written letter of the pronunciation also differs. This study on Chinese education, has constructed and implemented an easy way to study Chinese pronunciations by creating conversion rule set between Chinese pronunciation, Chinese Hanyu latin Pinyin and Korean chinese character pronunciation consisting of an initial sound, a medial vowel, and a final consonant. This study has established web version and application version of this conversion rule set education system to enhance Chinese education.

Adaptive Background Modeling Considering Stationary Object and Object Detection Technique based on Multiple Gaussian Distribution

  • Jeong, Jongmyeon;Choi, Jiyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.51-57
    • /
    • 2018
  • In this paper, we studied about the extraction of the parameter and implementation of speechreading system to recognize the Korean 8 vowel. Face features are detected by amplifying, reducing the image value and making a comparison between the image value which is represented for various value in various color space. The eyes position, the nose position, the inner boundary of lip, the outer boundary of upper lip and the outer line of the tooth is found to the feature and using the analysis the area of inner lip, the hight and width of inner lip, the outer line length of the tooth rate about a inner mouth area and the distance between the nose and outer boundary of upper lip are used for the parameter. 2400 data are gathered and analyzed. Based on this analysis, the neural net is constructed and the recognition experiments are performed. In the experiment, 5 normal persons were sampled. The observational error between samples was corrected using normalization method. The experiment show very encouraging result about the usefulness of the parameter.

Implementation of TTS Engine for Natural Voice (자연음 TTS(Text-To-Speech) 엔진 구현)

  • Cho Jung-Ho;Kim Tae-Eun;Lim Jae-Hwan
    • Journal of Digital Contents Society
    • /
    • v.4 no.2
    • /
    • pp.233-242
    • /
    • 2003
  • A TTS(Text-To-Speech) System is a computer-based system that should be able to read any text aloud. To output a natural voice, we need a general knowledge of language, a lot of time, and effort. Furthermore, the sound pattern of english has a variable pattern, which consists of phonemic and morphological analysis. It is very difficult to maintain consistency of pattern. To handle these problems, we present a system based on phonemic analysis for vowel and consonant. By analyzing phonological variations frequently found in spoken english, we have derived about phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In conclusion, we have a rule data which consists of phoneme, and a engine which economize in system. The proposed system can use not only communication system, but also utilize office automation and so on.

  • PDF

Design of Korean eye-typing interfaces based on multilevel input system (단계식 입력 체계를 이용한 시선 추적 기반의 한글 입력 인터페이스 설계)

  • Kim, Hojoong;Woo, Sung-kyung;Lee, Kunwoo
    • Journal of the HCI Society of Korea
    • /
    • v.12 no.4
    • /
    • pp.37-44
    • /
    • 2017
  • Eye-typing is one kind of human-computer interactive input system which is implemented by location data of gaze. It is widely used as an input system for paralytics because it does not require physical motions other than the eye movement. However, eye-typing interface based on Korean character has not been suggested yet. Thus, this research aims to implement the eye-typing interface optimized for Korean. To begin with, design objectives were established based on the features of eye-typing: significant noise and Midas touch problem. Multilevel input system was introduced to deal with noise, and an area free from input button was applied to solve Midas touch problem. Then, two types of eye-typing interfaces were suggested on phonological consideration of Korean where each syllable is generated from combination of several phonemes. Named as consonant-vowel integrated interface and separated interface, the two interfaces are designed to input Korean in phases through grouped phonemes. Finally, evaluation procedures composed of comparative experiments against the conventional Double-Korean keyboard interface, and analysis on flow of gaze were conducted. As a result, newly designed interfaces showed potential to be applied as practical tools for eye-typing.

The syllable recovrey rule-based system and the application of a morphological analysis method for the post-processing of a continuous speech recognition (연속음성인식 후처리를 위한 음절 복원 rule-based 시스템과 형태소분석기법의 적용)

  • 박미성;김미진;김계성;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.47-56
    • /
    • 1999
  • Various phonological alteration occurs when we pronounce continuously in korean. This phonological alteration is one of the major reasons which make the speech recognition of korean difficult. This paper presents a rule-based system which converts a speech recognition character string to a text-based character string. The recovery results are morphologically analyzed and only a correct text string is generated. Recovery is executed according to four kinds of rules, i.e., a syllable boundary final-consonant initial-consonant recovery rule, a vowel-process recovery rule, a last syllable final-consonant recovery rule and a monosyllable process rule. We use a x-clustering information for an efficient recovery and use a postfix-syllable frequency information for restricting recovery candidates to enter morphological analyzer. Because this system is a rule-based system, it doesn't necessitate a large pronouncing dictionary or a phoneme dictionary and the advantage of this system is that we can use the being text based morphological analyzer.

  • PDF

Hybrid Simulated Annealing for Data Clustering (데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Beom-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.92-98
    • /
    • 2017
  • Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.

Lip-reading System based on Bayesian Classifier (베이지안 분류를 이용한 립 리딩 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.

Comparison of Feature Performance in Off-line Hanwritten Korean Alphabet Recognition (오프라인 필기체 한글 자소 인식에 있어서 특징성능의 비교)

  • Ko, Tae-Seog;Kim, Jong-Ryeol;Chung, Kyu-Sik
    • Korean Journal of Cognitive Science
    • /
    • v.7 no.1
    • /
    • pp.57-74
    • /
    • 1996
  • This paper presents a comparison of recognition performance of the features used inthe recent handwritten korean character recognition.This research aims at providing the basis for feature selecion in order to improve not only the recognition rate but also the efficiency of recognition system.For the comparison of feature performace,we analyzed the characteristics of theose features and then,classified them into three rypes:global feature(image transformation)type,statistical feature type,and local/ topological feature type.For each type,we selected four or five features which seem more suitable to represent the characteristics of korean alphabet,and performed recongition experiments for the first consonant,horizontal vowel,and vertical vowel of a korean character, respectively.The classifier used in our experiments is a multi-layered perceptron with one hidden layer which is trained with backpropagation algorithm.The training and test data in the experiment are taken from 30sets of PE92. Experimental results show that 1)local/topological features outperform the other two type features in terms of recognition rates 2)mesh and projection features in statical feature type,walsh and DCT features in global feature type,and gradient and concavity features in local/topological feature type outperform the others in each type, respectively.

  • PDF