• Title/Summary/Keyword: HMM decoder

Search Result 7, Processing Time 0.018 seconds

A Robust Speaker Identification Using Optimized Confidence and Modified HMM Decoder (최적화된 관측 신뢰도와 변형된 HMM 디코더를 이용한 잡음에 강인한 화자식별 시스템)

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-Yu
    • MALSORI
    • /
    • no.64
    • /
    • pp.121-135
    • /
    • 2007
  • Speech signal is distorted by channel characteristics or additive noise and then the performances of speaker or speech recognition are severely degraded. To cope with the noise problem, we propose a modified HMM decoder algorithm using SNR-based observation confidence, which was successfully applied for GMM in speaker identification task. The modification is done by weighting observation probabilities with reliability values obtained from SNR. Also, we apply PSO (particle swarm optimization) method to the confidence function for maximizing the speaker identification performance. To evaluate our proposed method, we used the ETRI database for speaker recognition. The experimental results showed that the performance was definitely enhanced with the modified HMM decoder algorithm.

  • PDF

Modified HMM Decoder based on Observation Confidence for Speaker Identification (화자인식을 위한 관측신뢰도 기반 변형된 HMM 디코더)

  • Tariquzzaman, Md.;Min, So-Hui;Kim, Jin-Yeong;Na, Seung-Yu
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.443-446
    • /
    • 2007
  • 음성신호는 잡음 또는 전송 채널의 특성에 의하여 왜곡되고, 왜곡된 음성은 음성인식 및 화자인식의 성능을 크게 저하시킨다. 이러한 문제점을 극복하기 위해 본 논문에서는 Gaussian mixture model (GMM)에 적용된 신호대잡음비 (SNR)기반 신뢰도 가중 기법[1][2]을 Hidden Markov model(HMM) 디코더에 변형하여 적용하였다. HMM 디코더 변형은 HMM 상태별 관측확률을 논문 [1]에서 제시된 신뢰도로 가중함으로써 이루어졌다. 제안한 방법의 성능을 확인하기 위해 ETRI에서 만든 한국어 화자인식용 휴대폰 음성 DB를 사용하여 문맥종속 화자식별 실험을 하였다. 실험결과 기존 방법에 비해 제안한 방법의 화자인식률이 크게 향상됨을 확인 할 수 있었다.

  • PDF

Design of a Korean Speech Recognition Platform (한국어 음성인식 플랫폼의 설계)

  • Kwon Oh-Wook;Kim Hoi-Rin;Yoo Changdong;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.51
    • /
    • pp.151-165
    • /
    • 2004
  • For educational and research purposes, a Korean speech recognition platform is designed. It is based on an object-oriented architecture and can be easily modified so that researchers can readily evaluate the performance of a recognition algorithm of interest. This platform will save development time for many who are interested in speech recognition. The platform includes the following modules: Noise reduction, end-point detection, met-frequency cepstral coefficient (MFCC) and perceptually linear prediction (PLP)-based feature extraction, hidden Markov model (HMM)-based acoustic modeling, n-gram language modeling, n-best search, and Korean language processing. The decoder of the platform can handle both lexical search trees for large vocabulary speech recognition and finite-state networks for small-to-medium vocabulary speech recognition. It performs word-dependent n-best search algorithm with a bigram language model in the first forward search stage and then extracts a word lattice and restores each lattice path with a trigram language model in the second stage.

  • PDF

Visual analysis of attention-based end-to-end speech recognition (어텐션 기반 엔드투엔드 음성인식 시각화 분석)

  • Lim, Seongmin;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.41-49
    • /
    • 2019
  • An end-to-end speech recognition model consisting of a single integrated neural network model was recently proposed. The end-to-end model does not need several training steps, and its structure is easy to understand. However, it is difficult to understand how the model recognizes speech internally. In this paper, we visualized and analyzed the attention-based end-to-end model to elucidate its internal mechanisms. We compared the acoustic model of the BLSTM-HMM hybrid model with the encoder of the end-to-end model, and visualized them using t-SNE to examine the difference between neural network layers. As a result, we were able to delineate the difference between the acoustic model and the end-to-end model encoder. Additionally, we analyzed the decoder of the end-to-end model from a language model perspective. Finally, we found that improving end-to-end model decoder is necessary to yield higher performance.

Dynamic Bayesian Network-Based Gait Analysis (동적 베이스망 기반의 걸음걸이 분석)

  • Kim, Chan-Young;Sin, Bong-Kee
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.5
    • /
    • pp.354-362
    • /
    • 2010
  • This paper proposes a new method for a hierarchical analysis of human gait by dividing the motion into gait direction and gait posture using the tool of dynamic Bayesian network. Based on Factorial HMM (FHMM), which is a type of DBN, we design the Gait Motion Decoder (GMD) in a circular architecture of state space, which fits nicely to human walking behavior. Most previous studies focused on human identification and were limited in certain viewing angles and forwent modeling of the walking action. But this work makes an explicit and separate modeling of pedestrian pose and posture to recognize gait direction and detect orientation change. Experimental results showed 96.5% in pose identification. The work is among the first efforts to analyze gait motions into gait pose and gait posture, and it could be applied to a broad class of human activities in a number of situations.

Incorporation of IMM-based Feature Compensation and Uncertainty Decoding (IMM 기반 특징 보상 기법과 불확실성 디코딩의 결합)

  • Kang, Shin-Jae;Han, Chang-Woo;Kwon, Ki-Soo;Kim, Nam-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.6C
    • /
    • pp.492-496
    • /
    • 2012
  • This paper presents a decoding technique for speech recognition using uncertainty information from feature compensation method to improve the speech recognition performance in the low SNR condition. Traditional feature compensation algorithms have difficulty in estimating clean feature parameters in adverse environment. Those algorithms focus on the point estimation of desired features. The point estimation of feature compensation method degrades speech recognition performance when incorrectly estimated features enter into the decoder of speech recognition. In this paper, we apply the uncertainty information from well-known feature compensation method, such as IMM, to the recognition engine. Applied technique shows better performance in the Aurora-2 DB.

Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition (라벨이 없는 데이터를 사용한 종단간 음성인식기의 준교사 방식 도메인 적응)

  • Jeong, Hyeonjae;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.29-37
    • /
    • 2020
  • Recently, the neural network-based deep learning algorithm has dramatically improved performance compared to the classical Gaussian mixture model based hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system. In addition, researches on end-to-end (E2E) speech recognition systems integrating language modeling and decoding processes have been actively conducted to better utilize the advantages of deep learning techniques. In general, E2E ASR systems consist of multiple layers of encoder-decoder structure with attention. Therefore, E2E ASR systems require data with a large amount of speech-text paired data in order to achieve good performance. Obtaining speech-text paired data requires a lot of human labor and time, and is a high barrier to building E2E ASR system. Therefore, there are previous studies that improve the performance of E2E ASR system using relatively small amount of speech-text paired data, but most studies have been conducted by using only speech-only data or text-only data. In this study, we proposed a semi-supervised training method that enables E2E ASR system to perform well in corpus in different domains by using both speech or text only data. The proposed method works effectively by adapting to different domains, showing good performance in the target domain and not degrading much in the source domain.