• Title/Summary/Keyword: Speaker recognition systems

Search Result 86, Processing Time 0.022 seconds

Hidden Markov Models Containing Durational Information of States (상태의 고유시간 정보를 포함하는 Hidden Markov Model)

  • 조정호;홍재근;김수중
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.4
    • /
    • pp.636-644
    • /
    • 1990
  • Hidden Markov models(HMM's) have been known to be useful representation for speech signal and are used in a wide variety of speech systems. For speech recognition applications, it is desirable to incorporate durational information of states in model which correspond to phonetic duration of speech segments. In this paper we propose duration-dependent HMM's that include durational information of states appropriately for the left-to-right model. Reestimation formulae for the parameters of the proposed model are derived and their convergence is verified. Finally, the performance of the proposed models is verified by applying to an isolated word, speaker independent speech recognition system.

  • PDF

Semi-Continuous Hidden Markov Model with the MIN Module (MIN 모듈을 갖는 준연속 Hidden Markov Model)

  • Kim, Dae-Keuk;Lee, Jeong-Ju;Jeong, Ho-Kyoun;Lee, Sang-Hee
    • Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.11-26
    • /
    • 2000
  • In this paper, we propose the HMM with the MIN module. Because initial and re-estimated variance vectors are important elements for performance in HMM recognition systems, we propose a method which compensates for the mismatched statistical feature of training and test data. The MIN module function is a differentiable function similar to the sigmoid function. Unlike a continuous density function, it does not include variance vectors of the data set. The proposed hybrid HMM/MIN module is a unified network in which the observation probability in the HMM is replaced by the MIN module neural network. The parameters in the unified network are re-estimated by the gradient descent method for the Maximum Likelihood (ML) criterion. In estimating parameters, the variance vector is not estimated because there is no variance element in the MIN module function. The experiment was performed to compare the performance of the proposed HMM and the conventional HMM. The experiment measured an isolated number for speaker independent recognition.

  • PDF

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

  • Chung, Kyungyong;Oh, SangYeob
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.335-340
    • /
    • 2018
  • Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.

Development of Autonomous Mobile Robot with Speech Teaching Command Recognition System Based on Hidden Markov Model (HMM을 기반으로 한 자율이동로봇의 음성명령 인식시스템의 개발)

  • Cho, Hyeon-Soo;Park, Min-Gyu;Lee, Hyun-Jeong;Lee, Min-Cheol
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.13 no.8
    • /
    • pp.726-734
    • /
    • 2007
  • Generally, a mobile robot is moved by original input programs. However, it is very hard for a non-expert to change the program generating the moving path of a mobile robot, because he doesn't know almost the teaching command and operating method for driving the robot. Therefore, the teaching method with speech command for a handicapped person without hands or a non-expert without an expert knowledge to generate the path is required gradually. In this study, for easily teaching the moving path of the autonomous mobile robot, the autonomous mobile robot with the function of speech recognition is developed. The use of human voice as the teaching method provides more convenient user-interface for mobile robot. To implement the teaching function, the designed robot system is composed of three separated control modules, which are speech preprocessing module, DC servo motor control module, and main control module. In this study, we design and implement a speaker dependent isolated word recognition system for creating moving path of an autonomous mobile robot in the unknown environment. The system uses word-level Hidden Markov Models(HMM) for designated command vocabularies to control a mobile robot, and it has postprocessing by neural network according to the condition based on confidence score. As the spectral analysis method, we use a filter-bank analysis model to extract of features of the voice. The proposed word recognition system is tested using 33 Korean words for control of the mobile robot navigation, and we also evaluate the performance of navigation of a mobile robot using only voice command.

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.681-686
    • /
    • 2012
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.

Keyboard Solo System using a Real Time Hand Recognition Method (실시간 손 인식 기법을 인용한 건반 연주 시스템)

  • Lee, Eun-Kyung;Ha, Jung-Hee;Seo, Eun-Sung;Park, So-Young;Kim, Seong-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.11
    • /
    • pp.2273-2276
    • /
    • 2009
  • As parents are interested in music education for infants in these days, they require systems to help these infants to actively play music. In this paper, we propose a keyboard solo system using a real time hand recognition method. In order to enable the infants to use the system easily, the proposed system plays some sounds whenever the infants move their fingers on a paper piano. For the purpose of minimizing cost to play music, the proposed system utilizes a general PC with only a paper piano, a web camera, and a speaker. With the aim of precisely and efficiently recognizing both a hand and each key on keyboard, the proposed system divides a recognition step into a hand recognition step and a keyboard recognition step. Also, the hand recognition step considers only skin color, and the keyboard recognition step considers only black and white without other colors.

Interaction Intent Analysis of Multiple Persons using Nonverbal Behavior Features (인간의 비언어적 행동 특징을 이용한 다중 사용자의 상호작용 의도 분석)

  • Yun, Sang-Seok;Kim, Munsang;Choi, Mun-Taek;Song, Jae-Bok
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.19 no.8
    • /
    • pp.738-744
    • /
    • 2013
  • According to the cognitive science research, the interaction intent of humans can be estimated through an analysis of the representing behaviors. This paper proposes a novel methodology for reliable intention analysis of humans by applying this approach. To identify the intention, 8 behavioral features are extracted from the 4 characteristics in human-human interaction and we outline a set of core components for nonverbal behavior of humans. These nonverbal behaviors are associated with various recognition modules including multimodal sensors which have each modality with localizing sound source of the speaker in the audition part, recognizing frontal face and facial expression in the vision part, and estimating human trajectories, body pose and leaning, and hand gesture in the spatial part. As a post-processing step, temporal confidential reasoning is utilized to improve the recognition performance and integrated human model is utilized to quantitatively classify the intention from multi-dimensional cues by applying the weight factor. Thus, interactive robots can make informed engagement decision to effectively interact with multiple persons. Experimental results show that the proposed scheme works successfully between human users and a robot in human-robot interaction.

Development of Fire Evacuation Guidance System using Characteristics of High Frequency and a Smart Phone (고주파 특성과 스마트폰을 활용한 화재 대피 안내시스템 개발)

  • Jeon, Yu-Jin;Jun, Yeon-Soo;Yeom, Chunho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.10
    • /
    • pp.1376-1383
    • /
    • 2020
  • Although studies on fire evacuation systems are increasing, studies on the evacuation of evacuees in indoor spaces are insufficient. According to the latest research, it has been suggested that the use of high frequency might be effective for identifying the location of evacuees indoors. Accordingly, in this paper, the authors intend to develop evacuation location recognition technology and fire evacuation guidance system using high-frequency and a smartphone. The entire system was developed, including an app server, evacuees location recognition unit, an evacuation route search, an output unit, and a speaker unit based on Wi-Fi communication. The experimental results proved the possibility of the effectiveness of the system in the fire situation data. It is expected that this study could be used as an essential study of a fire evacuation guidance system using high frequency data in case of fire.

Speech Recognition and Its Learning by Neural Networks (신경회로망을 이용한 음성인식과 그 학습)

  • 이권현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.4
    • /
    • pp.350-357
    • /
    • 1991
  • A speech recognition system based on a neural network, which can be used for telephon number services was tested. Because in Korea two different cardinal number systems, a koreanic one and a sinokoreanic one, are in use, it is necessary that the used systems is able to recognize 22 discret words. The structure of the neural network used had two layers, also a structure with 3 layers, one hidden layreformed of each 11, 22 and 44 hidden units was tested. During the learning phase of the system the so called BP-algorithm (back propagation) was applied. The process of learning can e influenced by using a different learning factor and also by the method of learning(for instance random or cycle). The optimal rate of speaker independent recognition by using a 2 layer neural network was 96%. A drop of recognition was observed by overtraining. This phenomen appeared more clearly if a 3 layer neural network was used. These phenomens are described in this paper in more detail. Especially the influence of the construction of the neural network and the several states during the learning phase are examined.

  • PDF

Robust Speech Recognition Parameters for Emotional Variation (감정 변화에 강인한 음성 인식 파라메터)

  • Kim Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.6
    • /
    • pp.655-660
    • /
    • 2005
  • This paper studied the feature parameters less affected by the emotional variation for the development of the robust speech recognition technologies. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. In this study, LPC cepstral coefficient, met-cepstral coefficient, root-cepstral coefficient, PLP coefficient, RASTA met-cepstral coefficient were used as a feature parameters. And CMS and SBR method were used as a signal bias removal techniques. Experimental results showed that the HMM based speaker independent word recognizer using RASTA met-cepstral coefficient :md its derivatives and CMS as a signal bias removal showed the best performance of $7.05\%$ word error rate. This corresponds to about a $52\%$ word error reduction as compare to the performance of baseline system using met - cepstral coefficient.