• Title/Summary/Keyword: 은닉마코프모델

Search Result 82, Processing Time 0.021 seconds

Continuous Speech Recognition based on Parmetric Trajectory Segmental HMM (모수적 궤적 기반의 분절 HMM을 이용한 연속 음성 인식)

  • 윤영선;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.35-44
    • /
    • 2000
  • In this paper, we propose a new trajectory model for characterizing segmental features and their interaction based upon a general framework of hidden Markov models. Each segment, a sequence of vectors, is represented by a trajectory of observed sequences. This trajectory is obtained by applying a new design matrix which includes transitional information on contiguous frames, and is characterized as a polynomial regression function. To apply the trajectory to the segmental HMM, the frame features are replaced with the trajectory of a given segment. We also propose the likelihood of a given segment and the estimation of trajectory parameters. The obervation probability of a given segment is represented as the relation between the segment likelihood and the estimation error of the trajectories. The estimation error of a trajectory is considered as the weight of the likelihood of a given segment in a state. This weight represents the probability of how well the corresponding trajectory characterize the segment. The proposed model can be regarded as a generalization of a conventional HMM and a parametric trajectory model. The experimental results are reported on the TIMIT corpus and performance is show to improve significantly over that of the conventional HMM.

  • PDF

An Implementation of Embedded Speaker Identifier for PDA (PDA를 위한 내장형 화자인증기의 구현)

  • Kim, Dong-Ju;Roh, Yong-Wan;Kim, Dong-Gyu;Chung, Kwang-Woo;Hong, Kwang-Seok
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.286-289
    • /
    • 2005
  • 기존의 물리적 인증도구를 이용한 방식이나 패스워드 인증 방식은 분실, 도난, 해킹 등에 취약점을 가지고 있다. 따라서 지문, 서명, 홍채, 음성, 얼굴 등을 이용한 생체 인식기술을 보안 기술로 적용하려는 연구가 진행중이며 일부는 실용화도 되고 있다. 본 논문에서는 최근 널리 보급되어 있는 임베디드 시스템중의 하나인 PDA에 음성 기술을 이용한 내장형 화자 인증기를 구현하였다. 화자 인증기는 음성기술에서 널리 사용되고 있는 벡터 양자화 기술과 은닉 마코프 모델 기술을 사용하였으며, PDA의 하드웨어적인 제약 사항을 고려하여 사용되는 벡터 코드북을 두 가지로 다르게 하여 각각 구현하였다. 처음은 코드북을 화자 등록시에 발성음만을 이용하여 생성하고 화자인증 시에 이용하는 방법이며, 다른 하나는 대용량의 음성 데이터베이스를 이용하여 코드북을 사전에 생성하여 이를 화자 인증시에 이용하는 방법이다. 화자인증기의 성능평가는 5명의 화자가 10번씩 5개의 단어에 대하여 실험하여, 각각 화자종속 코득북을 이용한 인증기는 88.8%, 99.5%, 화자독립 코드북을 이용한 인증기는 85.6%, 95.5%의 인증율과 거절율을 보였으며, 93.5%와 90.0%의 평균 확률을 보였다.. 실험을 통하여 화자독립 인증기의 경우가 화자종속 인증기의 경우보다 낮은 인식율을 보였지만, 화자종속 인증기에서 나타나는 코드북 훈련시에 발생하는 메모리 문제를 해결 할 수 있었다.

  • PDF

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

  • 이기승;김정수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.135-144
    • /
    • 2003
  • A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

  • 김락용;조훈영;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.56-62
    • /
    • 2001
  • In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12% in a typical noise situation.

  • PDF

Study on the Improvement of Speech Recognizer by Using Time Scale Modification (시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.462-472
    • /
    • 2004
  • In this paper a method for compensating for thp performance degradation or automatic speech recognition (ASR) is proposed. which is mainly caused by speaking rate variation. Before the new method is proposed. quantitative analysis of the performance of an HMM-based ASR system according to speaking rate is first performed. From this analysis, significant performance degradation was often observed in the rapidly speaking speech signals. A quantitative measure is then introduced, which is able to represent speaking rate. Time scale modification (TSM) is employed to compensate the speaking rate difference between input speech signals and training speech signals. Finally, a method for compensating the performance degradation caused by speaking rate variation is proposed, in which TSM is selectively employed according to speaking rate. By the results from the ASR experiments devised for the 10-digits mobile phone number, it is confirmed that the error rate was reduced by 15.5% when the proposed method is applied to the high speaking rate speech signals.

An Extraction Method of Meaningful Hand Gesture for a Robot Control (로봇 제어를 위한 의미 있는 손동작 추출 방법)

  • Kim, Aram;Rhee, Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.27 no.2
    • /
    • pp.126-131
    • /
    • 2017
  • In this paper, we propose a method to extract meaningful motion among various kinds of hand gestures on giving commands to robots using hand gestures. On giving a command to the robot, the hand gestures of people can be divided into a preparation one, a main one, and a finishing one. The main motion is a meaningful one for transmitting a command to the robot in this process, and the other operation is a meaningless auxiliary operation to do the main motion. Therefore, it is necessary to extract only the main motion from the continuous hand gestures. In addition, people can move their hands unconsciously. These actions must also be judged by the robot with meaningless ones. In this study, we extract human skeleton data from a depth image obtained by using a Kinect v2 sensor and extract location data of hands data from them. By using the Kalman filter, we track the location of the hand and distinguish whether hand motion is meaningful or meaningless to recognize the hand gesture by using the hidden markov model.

Factored MLLR Adaptation for HMM-Based Speech Synthesis in Naval-IT Fusion Technology (인자화된 최대 공산선형회귀 적응기법을 적용한 해양IT융합기술을 위한 HMM기반 음성합성 시스템)

  • Sung, June Sig;Hong, Doo Hwa;Jeong, Min A;Lee, Yeonwoo;Lee, Seong Ro;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.2
    • /
    • pp.213-218
    • /
    • 2013
  • One of the most popular approaches to parameter adaptation in hidden Markov model (HMM) based systems is the maximum likelihood linear regression (MLLR) technique. In our previous study, we proposed factored MLLR (FMLLR) where each MLLR parameter is defined as a function of a control vector. We presented a method to train the FMLLR parameters based on a general framework of the expectation-maximization (EM) algorithm. Using the proposed algorithm, supplementary information which cannot be included in the models is effectively reflected in the adaptation process. In this paper, we apply the FMLLR algorithm to a pitch sequence as well as spectrum parameters. In a series of experiments on artificial generation of expressive speech, we evaluate the performance of the FMLLR technique and also compare with other approaches to parameter adaptation in HMM-based speech synthesis.

Acoustic Signal-Based Tunnel Incident Detection System (음향신호 기반 터널 돌발상황 검지시스템)

  • Jang, Jinhwan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.18 no.5
    • /
    • pp.112-125
    • /
    • 2019
  • An acoustic signal-based, tunnel-incident detection system was developed and evaluated. The system was comprised of three components: algorithm, acoustic signal collector, and server system. The algorithm, which was based on nonnegative tensor factorization and a hidden Markov model, processes the acoustic signals to attenuate noise and detect incident-related signals. The acoustic signal collector gathers the tunnel sounds, digitalizes them, and transmits the digitalized acoustic signals to the center server. The server system issues an alert once the algorithm identifies an incident. The performance of the system was evaluated thoroughly in two steps: first, in a controlled tunnel environment using the recorded incident sounds, and second, in an uncontrolled tunnel environment using real-world incident sounds. As a result, the detection rates ranged from 80 to 95% at distances from 50 to 10 m in the controlled environment, and 94 % in the uncontrolled environment. The superiority of the developed system to the existing video image and loop detector-based systems lies in its instantaneous detection capability with less than 2 s.

Singing Voice Synthesis Using HMM Based TTS and MusicXML (HMM 기반 TTS와 MusicXML을 이용한 노래음 합성)

  • Khan, Najeeb Ullah;Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.5
    • /
    • pp.53-63
    • /
    • 2015
  • Singing voice synthesis is the generation of a song using a computer given its lyrics and musical notes. Hidden Markov models (HMM) have been proved to be the models of choice for text to speech synthesis. HMMs have also been used for singing voice synthesis research, however, a huge database is needed for the training of HMMs for singing voice synthesis. And commercially available singing voice synthesis systems which use the piano roll music notation, needs to adopt the easy to read standard music notation which make it suitable for singing learning applications. To overcome this problem, we use a speech database for training context dependent HMMs, to be used for singing voice synthesis. Pitch and duration control methods have been devised to modify the parameters of the HMMs trained on speech, to be used as the synthesis units for the singing voice. This work describes a singing voice synthesis system which uses a MusicXML based music score editor as the front-end interface for entry of the notes and lyrics to be synthesized and a hidden Markov model based text to speech synthesis system as the back-end synthesizer. A perceptual test shows the feasibility of our proposed system.

Estimation and Weighting of Sub-band Reliability for Multi-band Speech Recognition (다중대역 음성인식을 위한 부대역 신뢰도의 추정 및 가중)

  • 조훈영;지상문;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.552-558
    • /
    • 2002
  • Recently, based on the human speech recognition (HSR) model of Fletcher, the multi-band speech recognition has been intensively studied by many researchers. As a new automatic speech recognition (ASR) technique, the multi-band speech recognition splits the frequency domain into several sub-bands and recognizes each sub-band independently. The likelihood scores of sub-bands are weighted according to reliabilities of sub-bands and re-combined to make a final decision. This approach is known to be robust under noisy environments. When the noise is stationary a sub-band SNR can be estimated using the noise information in non-speech interval. However, if the noise is non-stationary it is not feasible to obtain the sub-band SNR. This paper proposes the inverse sub-band distance (ISD) weighting, where a distance of each sub-band is calculated by a stochastic matching of input feature vectors and hidden Markov models. The inverse distance is used as a sub-band weight. Experiments on 1500∼1800㎐ band-limited white noise and classical guitar sound revealed that the proposed method could represent the sub-band reliability effectively and improve the performance under both stationary and non-stationary band-limited noise environments.