• Title/Summary/Keyword: LBG Clustering

Search Result 14, Processing Time 0.027 seconds

A Study on Modified Clustering Algorithm for Text-Dependent Speaker Verification System (문장종속 화자확인 시스템을 위한 개선된 군집화 알고리즘에 관한 연구)

  • 강철호;정희석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.7
    • /
    • pp.548-553
    • /
    • 2004
  • In this paper we propose modified LBG algorithm to minimize quantization errors. When we apply conventional LBG algorithm for speaker verification system, problems that result from small amount of training data can be generated. That is, quantization error comes from fixed-sized codebook without any consideration for speaker characteristics and splitting vector in the wrong direction worsen performance of speaker verification system. So, we propose modified clustering method that has variable sized codebook according to speaker characteristics and makes right splitting direction by finding the farthest member away from mean and then find another member from the member. Simulation results show effectiveness of the proposed algorithm.

The Effect of the Number of Clusters on Speech Recognition with Clustering by ART2/LBG

  • Lee, Chang-Young
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.3-8
    • /
    • 2009
  • In an effort to improve speech recognition, we investigated the effect of the number of clusters. In usual LBG clustering, the number of codebook clusters is doubled on each bifurcation and hence cannot be chosen arbitrarily in a natural way. To have the number of clusters at our control, we combined adaptive resonance theory (ART2) with LBG and perform the clustering in two stages. The codebook thus formed was used in subsequent processing of fuzzy vector quantization (FVQ) and HMM for speech recognition tests. Compared to conventional LBG, our method was shown to reduce the best recognition error rate by 0${\sim$}0.9% depending on the vocabulary size. The result also showed that between 400 and 800 would be the optimal number of clusters in the limit of small and large vocabulary speech recognitions of isolated words, respectively.

  • PDF

Improvement of Network Intrusion Detection Rate by Using LBG Algorithm Based Data Mining (LBG 알고리즘 기반 데이터마이닝을 이용한 네트워크 침입 탐지율 향상)

  • Park, Seong-Chul;Kim, Jun-Tae
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.23-36
    • /
    • 2009
  • Network intrusion detection have been continuously improved by using data mining techniques. There are two kinds of methods in intrusion detection using data mining-supervised learning with class label and unsupervised learning without class label. In this paper we have studied the way of improving network intrusion detection accuracy by using LBG clustering algorithm which is one of unsupervised learning methods. The K-means method, that starts with random initial centroids and performs clustering based on the Euclidean distance, is vulnerable to noisy data and outliers. The nonuniform binary split algorithm uses binary decomposition without assigning initial values, and it is relatively fast. In this paper we applied the EM(Expectation Maximization) based LBG algorithm that incorporates the strength of two algorithms to intrusion detection. The experimental results using the KDD cup dataset showed that the accuracy of detection can be improved by using the LBG algorithm.

  • PDF

A Study on the Reference Template Database Design Method for Frame-based Classification of Underwater Transient Signals (프레임 기반의 수중 천이신호 식별을 위한 기준패턴의 데이터베이스 구성 방법에 관한 연구)

  • Lim, Tae-Gyun;Ryu, Jong-Youb;Kim, Tae-Hwan;Bae, Keun-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.885-886
    • /
    • 2008
  • This paper presents a reference template design method for frame-based classification of underwater transient signals. In the proposed method, framebased feature vectors of each reference signal are clustered by using LBG clustering algorithm to reduce the number of feature vectors in each class. Experimental results have shown that drastic reduction of the reference database can be achieved while maintaining the classification performance with LBG clustering algorithm.

  • PDF

Fast LBG Algorithm to Reduce the Computational Complexity

  • Kim Dong-Hyun;Kang Chul-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4E
    • /
    • pp.123-127
    • /
    • 2005
  • In this paper, we propose a new method for reducing the number of distance calculations in the LBG (Linde, Buzo, Gray) algorithm, which is widely used method to construct a codebook in vector quantization of speech recognition system. The proposed algorithm can reduce the distance calculation between input vector and codeword by utilizing the observation that codewords are quickly stabilized as the number of iteration increases. From the simulation results, it is shown that we can reduce the running times over $43.77\%$ on average in comparison with current LBG algorithm without sacrificing the performance of codebook.

Nonlinear Process Modeling Using Hard Partition-based Inference System (Hard 분산 분할 기반 추론 시스템을 이용한 비선형 공정 모델링)

  • Park, Keon-Jun;Kim, Yong-Kab
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.7 no.4
    • /
    • pp.151-158
    • /
    • 2014
  • In this paper, we introduce an inference system using hard scatter partition method and model the nonlinear process. To do this, we use the hard scatter partition method that partition the input space in the scatter form with the value of the membership degree of 0 or 1. The proposed method is implemented by C-Means clustering algorithm. and is used for the initial center values by means of binary split. by applying the LBG algorithm to compensate for shortcomings in the sensitive initial center value. Hard-scatter-partitioned input space forms the rules in the rule-based system modeling. The premise parameters of the rules are determined by membership matrix by means of C-Means clustering algorithm. The consequence part of the rules is expressed in the form of polynomial functions and the coefficient parameters of each rule are determined by the standard least-squares method. The data widely used in nonlinear process is used to model the nonlinear process and evaluate the characteristics of nonlinear process.

Vector Quantization of Reference Signals for Efficient Frame-Based Classification of Underwater Transient Signals (프레임 기반의 효율적인 수중 천이신호 식별을 위한 참조 신호의 벡터 양자화)

  • Lim, Tae-Gyun;Kim, Tae-Hwan;Bae, Keun-Sung;Hwang, Chan-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.2C
    • /
    • pp.181-185
    • /
    • 2009
  • When we classify underwater transient signals with frame-by-frame decision, a database design method for reference feature vectors influences on the system performance such as size of database, computational burden and recognition rate. In this paper the LBG vector quantization algorithm is applied to reduction of the number of feature vectors for each reference signal for efficient classification of underwater transient signals. Experimental results have shown that drastic reduction of the database size can be achieved while maintaining the classification performance by using the LBG vector quantization.

Distinction of Color Similarity for Clothes based on the LBG Algorithm (LBG 알고리즘 기반의 의상 색상 유사성 판별)

  • Ju, Hyung-Don;Hong, Min;Cho, We-Duke;Moon, Nam-Mee;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.9 no.5
    • /
    • pp.117-130
    • /
    • 2008
  • This paper proposes a stable and robust method to distinct the color similarity for clothes using the LBG algorithm under various light sources, Since the conventional methods, such as the histogram intersection and the accumulated histogram, are profoundly sensitive to the changing of light environments, the distinction of color similarity for the same cloth can be different due to the complicated light sources. To reduce the effects of the light sources, the properties of hue and saturation which consistently sustain the characteristic of the color under the various changes of light sources are analyzed to define the characteristic of the color distribution. In a two-dimensional space determined by the properties of hue and saturation, the LBG algorithm, a non-parametric clustering approach, is applied to examine the color distribution of images for each clothes. The color similarity of images is defined by the average of Euclidean distance between the mapping clusters which are calculated from the result of clustering of both images. To prove the stability of the proposed method, the results of the color similarity between our method and the traditional histogram analysis based methods are compared using a dozen of cloth examples that obtained under different light environments. Our method successively provides the classification between the same cloth image pair and the different cloth image pair and this classification of color similarity for clothe images obtains the 91.6% of success rate.

  • PDF

A Study on VQ/HMM using Nonlinear Clustering and Smoothing Method (비선형 집단화와 완화기법을 이용한 VQ/HMM에 관한 연구)

  • 정희석;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.35-42
    • /
    • 1999
  • In this paper, a modified clustering algorithm is proposed to improve the discrimination of discrete HMM(Hidden Markov Model), so that it has increased recognition rate of 2.16% in comparison with the original HMM using the K-means or LBG algorithm. And, for preventing the decrease of recognition rate because of insufficient training data at the training scheme of HMM, a modified probabilistic smoothing method is proposed, which has increased recognition rate of 3.07% for the speaker-independent case. In the experiment applied the two proposed algorithms, the average rate of recognition has increased 4.66% for the speaker-independent case in comparison with that of original VQ/HMM.

  • PDF

Iterative LBG Clustering for SIMO Channel Identification

  • Daneshgaran, Fred;Laddomada, Massimiliano
    • Journal of Communications and Networks
    • /
    • v.5 no.2
    • /
    • pp.157-166
    • /
    • 2003
  • This paper deals with the problem of channel identification for Single Input Multiple Output (SIMO) slow fading channels using clustering algorithms. Due to the intrinsic memory of the discrete-time model of the channel, over short observation periods, the received data vectors of the SIMO model are spread in clusters because of the AWGN noise. Each cluster is practically centered around the ideal channel output labels without noise and the noisy received vectors are distributed according to a multivariate Gaussian distribution. Starting from the Markov SIMO channel model, simultaneous maximum ikelihood estimation of the input vector and the channel coefficients reduce to one of obtaining the values of this pair that minimizes the sum of the Euclidean norms between the received and the estimated output vectors. Viterbi algorithm can be used for this purpose provided the trellis diagram of the Markov model can be labeled with the noiseless channel outputs. The problem of identification of the ideal channel outputs, which is the focus of this paper, is then equivalent to designing a Vector Quantizer (VQ) from a training set corresponding to the observed noisy channel outputs. The Linde-Buzo-Gray (LBG)-type clustering algorithms [1] could be used to obtain the noiseless channel output labels from the noisy received vectors. One problem with the use of such algorithms for blind time-varying channel identification is the codebook initialization. This paper looks at two critical issues with regards to the use of VQ for channel identification. The first has to deal with the applicability of this technique in general; we present theoretical results for the conditions under which the technique may be applicable. The second aims at overcoming the codebook initialization problem by proposing a novel approach which attempts to make the first phase of the channel estimation faster than the classical codebook initialization methods. Sample simulation results are provided confirming the effectiveness of the proposed initialization technique.