• Title/Summary/Keyword: GMMs

Search Result 24, Processing Time 0.021 seconds

Adaptation and Clustering Method for Speaker Identification with Small Training Data (화자적응과 군집화를 이용한 화자식별 시스템의 성능 및 속도 향상)

  • Kim Se-Hyun;Oh Yung-Hwan
    • MALSORI
    • /
    • no.58
    • /
    • pp.83-99
    • /
    • 2006
  • One key factor that hinders the widespread deployment of speaker identification technologies is the requirement of long enrollment utterances to guarantee low error rate during identification. To gain user acceptance of speaker identification technologies, adaptation algorithms that can enroll speakers with short utterances are highly essential. To this end, this paper applies MLLR speaker adaptation for speaker enrollment and compares its performance against other speaker modeling techniques: GMMs and HMM. Also, to speed up the computational procedure of identification, we apply speaker clustering method which uses principal component analysis (PCA) and weighted Euclidean distance as distance measurement. Experimental results show that MLLR adapted modeling method is most effective for short enrollment utterances and that the GMMs performs better when long utterances are available.

  • PDF

Speaker Identification in Small Training Data Environment using MLLR Adaptation Method (MLLR 화자적응 기법을 이용한 적은 학습자료 환경의 화자식별)

  • Kim, Se-hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.159-162
    • /
    • 2005
  • Identification is the process automatically identify who is speaking on the basis of information obtained from speech waves. In training phase, each speaker models are trained using each speaker's speech data. GMMs (Gaussian Mixture Models), which have been successfully applied to speaker modeling in text-independent speaker identification, are not efficient in insufficient training data environment. This paper proposes speaker modeling method using MLLR (Maximum Likelihood Linear Regression) method which is used for speaker adaptation in speech recognition. We make SD-like model using MLLR adaptation method instead of speaker dependent model (SD). Proposed system outperforms the GMMs in small training data environment.

  • PDF

SVM Based Speaker Verification Using Sparse Maximum A Posteriori Adaptation

  • Kim, Younggwan;Roh, Jaeyoung;Kim, Hoirin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.5
    • /
    • pp.277-281
    • /
    • 2013
  • Modern speaker verification systems based on support vector machines (SVMs) use Gaussian mixture model (GMM) supervectors as their input feature vectors, and the maximum a posteriori (MAP) adaptation is a conventional method for generating speaker-dependent GMMs by adapting a universal background model (UBM). MAP adaptation requires the appropriate amount of input utterance due to the number of model parameters to be estimated. On the other hand, with limited utterances, unreliable MAP adaptation can be performed, which causes adaptation noise even though the Bayesian priors used in the MAP adaptation smooth the movements between the UBM and speaker dependent GMMs. This paper proposes a sparse MAP adaptation method, which is known to perform well in the automatic speech recognition area. By introducing sparse MAP adaptation to the GMM-SVM-based speaker verification system, the adaptation noise can be mitigated effectively. The proposed method utilizes the L0 norm as a regularizer to induce sparsity. The experimental results on the TIMIT database showed that the sparse MAP-based GMM-SVM speaker verification system yields a 42.6% relative reduction in the equal error rate with few additional computations.

  • PDF

Research about auto-segmentation via SVM (SVM을 이용한 자동 음소분할에 관한 연구)

  • 권호민;한학용;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2220-2223
    • /
    • 2003
  • In this paper we used Support Vector Machines(SVMs) recently proposed as the loaming method, one of Artificial Neural Network, to divide continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. From experiment we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).

  • PDF

Terrain Geometry from Monocular Image Sequences

  • McKenzie, Alexander;Vendrovsky, Eugene;Noh, Jun-Yong
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.1
    • /
    • pp.98-108
    • /
    • 2008
  • Terrain reconstruction from images is an ill-posed, yet commonly desired Structure from Motion task when compositing visual effects into live-action photography. These surfaces are required for choreography of a scene, casting physically accurate shadows of CG elements, and occlusions. We present a novel framework for generating the geometry of landscapes from extremely noisy point cloud datasets obtained via limited resolution techniques, particularly optical flow based vision algorithms applied to live-action video plates. Our contribution is a new statistical approach to remove erroneous tracks ('outliers') by employing a unique combination of well established techniques-including Gaussian Mixture Models (GMMs) for robust parameter estimation and Radial Basis Functions (REFs) for scattered data interpolation-to exploit the natural constraints of this problem. Our algorithm offsets the tremendously laborious task of modeling these landscapes by hand, automatically generating a visually consistent, camera position dependent, thin-shell surface mesh within seconds for a typical tracking shot.

Transfer and Survival of Genes Resistant to Antibiotics in Soil (토양환경에서 항생제 내성 인자의 전이 및 생존)

  • Lee, Geon-Hyoung;Lee, Jae-Sei
    • The Korean Journal of Ecology
    • /
    • v.17 no.2
    • /
    • pp.223-235
    • /
    • 1994
  • The transfer of plasmid-borne genes coding for resistance to antibiotics (Ampicillin, Carbenicillin, and tetracycline) among 16 strains isolated from Mankyong River was examined. The survival of donors, recipient, and transformants in sterile and nonsterile soil (the soil was amended with 12% vol/vol with the clay mineral, montmorillonite) was also studied. In sterile soil, the survival was prolonged in the order of donors, transformants, and recipient. The survival of donors, transformants, and recipient increased when the soil was amended with 12% montmorillonite, but not in nonsterile soil. In nonsterile soil, donors survived longer than transformants and recipient, but the survival of transformants and recipient showed no significant differences. The results of these studies suggest that genes can be transferred by transformation, and transferred genes can survive in soil for a considerable time.

  • PDF

Detection of Pathological Voice Using Linear Discriminant Analysis

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.64
    • /
    • pp.77-88
    • /
    • 2007
  • Nowadays, mel-frequency cesptral coefficients (MFCCs) and Gaussian mixture models (GMMs) are used for the pathological voice detection. This paper suggests a method to improve the performance of the pathological/normal voice classification based on the MFCC-based GMM. We analyze the characteristics of the mel frequency-based filterbank energies using the fisher discriminant ratio (FDR). And the feature vectors through the linear discriminant analysis (LDA) transformation of the filterbank energies (FBE) and the MFCCs are implemented. An accuracy is measured by the GMM classifier. This paper shows that the FBE LDA-based GMM is a sufficiently distinct method for the pathological/normal voice classification, with a 96.6% classification performance rate. The proposed method shows better performance than the MFCC-based GMM with noticeable improvement of 54.05% in terms of error reduction.

  • PDF

Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter (HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.66
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF

Dimension-Reduced Audio Spectrum Projection Features for Classifying Video Sound Clips

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.3E
    • /
    • pp.89-94
    • /
    • 2006
  • For audio indexing and targeted search of specific audio or corresponding visual contents, the MPEG-7 standard has adopted a sound classification framework, in which dimension-reduced Audio Spectrum Projection (ASP) features are used to train continuous hidden Markov models (HMMs) for classification of various sounds. The MPEG-7 employs Principal Component Analysis (PCA) or Independent Component Analysis (ICA) for the dimensional reduction. Other well-established techniques include Non-negative Matrix Factorization (NMF), Linear Discriminant Analysis (LDA) and Discrete Cosine Transformation (DCT). In this paper we compare the performance of different dimensional reduction methods with Gaussian mixture models (GMMs) and HMMs in the classifying video sound clips.

A Study on Gaussian Mixture Synthesis for High-Performance Speech Recognition (High-Performance 음성 인식을 위한 Efficient Mixture Gaussian 합성에 관한 연구)

  • 이상복;이철희;김종교
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.195-198
    • /
    • 2002
  • We propose an efficient mixture Gaussian synthesis method for decision tree based state tying that produces better context-dependent models in a short period of training time. This method makes it possible to handle mixture Gaussian HMMs in decision tree based state tying algorithm, and provides higher recognition performance compared to the conventional HMM training procedure using decision tree based state tying on single Gaussian GMMs. This method also reduces the steps of HMM training procedure. We applied this method to training of PBS, and we expect to achieve a little point improvement in phoneme accuarcy and reduction in training time.

  • PDF