• Title/Summary/Keyword: Baum-welch Re-estimation

Search Result 6, Processing Time 0.022 seconds

HMM-Based Bandwidth Extension Using Baum-Welch Re-Estimation Algorithm (Baum-Welch 학습법을 이용한 HMM 기반 대역폭 확장법)

  • Song, Geun-Bae;Kim, Austin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.259-268
    • /
    • 2007
  • This paper contributes to an improvement of the statistical bandwidth extension(BWE) system based on Hidden Markov Model(HMM). First, the existing HMM training method for BWE, which is suggested originally by Jax, is analyzed in comparison with the general Baum-Welch training method. Next, based on this analysis, a new HMM-based BWE method is suggested which adopts the Baum-Welch re-estimation algorithm instead of the Jax's to train HMM model. Conclusionally speaking, the Baum-Welch re-estimation algorithm is a generalized form of the Jax's training method. It is flexible and adaptive in modeling the statistical characteristic of training data. Therefore, it generates a better model to the training data, which results in an enhanced BWE system. According to experimental results, the new method performs much better than the Jax's BWE systemin all cases. Under the given test conditions, the RMS log spectral distortion(LSD) scores were improved ranged from 0.31dB to 0.8dB, and 0.52dB in average.

A Study on Adaptive Model Updating and a Priori Threshold Decision for Speaker Verification System (화자 확인 시스템을 위한 적응적 모델 갱신과 사전 문턱치 결정에 관한 연구)

  • 진세훈;이재희;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.20-26
    • /
    • 2000
  • In speaker verification system the HMM(hidden Markov model) parameter updating using small amount of data and the priori threshold decision are crucial factor for dealing with long-term variability in people voices. In the paper we present the speaker model updating technique which can be adaptable to the session-to-intra speaker variability and the priori threshold determining technique. The proposed technique decreases verification error rates which the session-to-session intra-speaker variability can bring by adapting new speech data to speaker model parameter through Baum Welch re-estimation. And in this study the proposed priori threshold determining technique is decided by a hybrid score measurement which combines the world model based technique and the cohen model based technique together. The results show that the proposed technique can lead a better performance and the difference of performance is small between the posteriori threshold decision based approach and the proposed priori threshold decision based approach.

  • PDF

A Study on Hybrid Structure of Semi-Continuous HMM and RBF for Speaker Independent Speech Recognition (화자 독립 음성 인식을 위한 반연속 HMM과 RBF의 혼합 구조에 관한 연구)

  • 문연주;전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.94-99
    • /
    • 1999
  • It is the hybrid structure of HMM and neural network(NN) that shows high recognition rate in speech recognition algorithms. And it is a method which has majorities of statistical model and neural network model respectively. In this study, we propose a new style of the hybrid structure of semi-continuous HMM(SCHMM) and radial basis function(RBF), which re-estimates weighting coefficients probability affecting observation probability after Baum-Welch estimation. The proposed method takes account of the similarity of basis Auction of RBF's hidden layer and SCHMM's probability density functions so as to discriminate speech signals sensibly through the learned and estimated weighting coefficients of RBF. As simulation results show that the recognition rates of the hybrid structure SCHMM/RBF are higher than those of SCHMM in unlearned speakers' recognition experiment, the proposed method has been proved to be one which has more sensible property in recognition than SCHMM.

  • PDF

Performance Comparison of GMM and HMM Approaches for Bandwidth Extension of Speech Signals (음성신호의 대역폭 확장을 위한 GMM 방법 및 HMM 방법의 성능평가)

  • Song, Geun-Bae;Kim, Austin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.3
    • /
    • pp.119-128
    • /
    • 2008
  • This paper analyzes the relationship between two representative statistical methods for bandwidth extension (BWE): Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) ones, and compares their performances. The HMM method is a memory-based system which was developed to take advantage of the inter-frame dependency of speech signals. Therefore, it could be expected to estimate better the transitional information of the original spectra from frame to frame. To verify it, a dynamic measure that is an approximation of the 1st-order derivative of spectral function over time was introduced in addition to a static measure. The comparison result shows that the two methods are similar in the static measure, while, in the dynamic measure, the HMM method outperforms explicitly the GMM one. Moreover, this difference increases in proportion to the number of states of HMM model. This indicates that the HMM method would be more appropriate at least for the 'blind BWE' problem. On the other hand, nevertheless, the GMM method could be treated as a preferable alternative of the HMM one in some applications where the static performance and algorithm complexity are critical.

HMM-based Music Identification System for Copyright Protection (저작권 보호를 위한 HMM기반의 음악 식별 시스템)

  • Kim, Hee-Dong;Kim, Do-Hyun;Kim, Ji-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.63-67
    • /
    • 2009
  • In this paper, in order to protect music copyrights, we propose a music identification system which is scalable to the number of pieces of registered music and robust to signal-level variations of registered music. For its implementation, we define the new concepts of 'music word' and 'music phoneme' as recognition units to construct 'music acoustic models'. Then, with these concepts, we apply the HMM-based framework used in continuous speech recognition to identify the music. Each music file is transformed to a sequence of 39-dimensional vectors. This sequence of vectors is represented as ordered states with Gaussian mixtures. These ordered states are trained using Baum-Welch re-estimation method. Music files with a suspicious copyright are also transformed to a sequence of vectors. Then, the most probable music file is identified using Viterbi algorithm through the music identification network. We implemented a music identification system for 1,000 MP3 music files and tested this system with variations in terms of MP3 bit rate and music speed rate. Our proposed music identification system demonstrates robust performance to signal variations. In addition, scalability of this system is independent of the number of registered music files, since our system is based on HMM method.

  • PDF

Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs

  • Ke, Shian-Ru;Thuc, Hoang Le Uyen;Hwang, Jenq-Neng;Yoo, Jang-Hee;Choi, Kyoung-Ho
    • ETRI Journal
    • /
    • v.36 no.4
    • /
    • pp.662-672
    • /
    • 2014
  • Human action recognition is used in areas such as surveillance, entertainment, and healthcare. This paper proposes a system to recognize both single and continuous human actions from monocular video sequences, based on 3D human modeling and cyclic hidden Markov models (CHMMs). First, for each frame in a monocular video sequence, the 3D coordinates of joints belonging to a human object, through actions of multiple cycles, are extracted using 3D human modeling techniques. The 3D coordinates are then converted into a set of geometrical relational features (GRFs) for dimensionality reduction and discrimination increase. For further dimensionality reduction, k-means clustering is applied to the GRFs to generate clustered feature vectors. These vectors are used to train CHMMs separately for different types of actions, based on the Baum-Welch re-estimation algorithm. For recognition of continuous actions that are concatenated from several distinct types of actions, a designed graphical model is used to systematically concatenate different separately trained CHMMs. The experimental results show the effective performance of our proposed system in both single and continuous action recognition problems.