• Title/Summary/Keyword: clustering model

Search Result 1,217, Processing Time 0.048 seconds

The Application of an HMM-based Clustering Method to Speaker Independent Word Recognition (HMM을 기본으로한 집단화 방법의 불특정화자 단어 인식에 응용)

  • Lim, H.;Park, S.-Y.;Park, M.-W.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.5-10
    • /
    • 1995
  • In this paper we present a clustering procedure based on the use of HMM in order to get multiple statistical models which can well absorb the variants of each speaker with different ways of saying words. The HMM-clustered models obtained from the developed technique are applied to the speaker independent isolated word recognition. The HMM clustering method splits off all observation sequences with poor likelihood scores which fall below threshold from the training set and create a new model out of the observation sequences in the new cluster. Clustering is iterated by classifying each observation sequence as belonging to the cluster whose model has the maximum likelihood score. If any clutter has changed from the previous iteration the model in that cluster is reestimated by using the Baum-Welch reestimation procedure. Therefore, this method is more efficient than the conventional template-based clustering technique due to the integration capability of the clustering procedure and the parameter estimation. Experimental data show that the HMM-based clustering procedure leads to $1.43\%$ performance improvements over the conventional template-based clustering method and $2.08\%$ improvements over the single HMM method for the case of recognition of the isolated korean digits.

  • PDF

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

Retrieve System for Performance support of Vocabulary Clustering Model In Continuous Vocabulary Recognition System (연속 어휘 인식 시스템에서 어휘 클러스터링 모델의 성능 지원을 위한 검색 시스템)

  • Oh, Sang Yeob
    • Journal of Digital Convergence
    • /
    • v.10 no.9
    • /
    • pp.339-344
    • /
    • 2012
  • Established continuous vocabulary recognition system improved recognition rate by using decision tree based tying modeling method. However, since system model cannot support the retrieve of phoneme data, it is hard to secure the accuracy. In order to improve this problem, we remodeled a system that could retrieve probabilistic model from continuous vocabulary clustering model to phoneme unit. Therefore in this paper showed 95.88%of recognition rate in system performance.

Genetically Optimized Information Granules-based FIS (유전자적 최적 정보 입자 기반 퍼지 추론 시스템)

  • Park, Keon-Jun;Oh, Sung-Kwun;Lee, Young-Il
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.146-148
    • /
    • 2005
  • In this paper, we propose a genetically optimized identification of information granulation(IG)-based fuzzy model. To optimally design the IG-based fuzzy model we exploit a hybrid identification through genetic alrogithms(GAs) and Hard C-Means (HCM) clustering. An initial structure of fuzzy model is identified by determining the number of input, the seleced input variables, the number of membership function, and the conclusion inference type by means of GAs. Granulation of information data with the aid of Hard C-Means(HCM) clustering algorithm help determine the initial paramters of fuzzy model such as the initial apexes of the membership functions and the initial values of polyminial functions being used in the premise and consequence part of the fuzzy rules. And the inital parameters are tuned effectively with the aid of the genetic algorithms and the least square method. And also, we exploite consecutive identification of fuzzy model in case of identification of structure and parameters. Numerical example is included to evaluate the performance of the proposed model.

  • PDF

Use of Factor Analyzer Normal Mixture Model with Mean Pattern Modeling on Clustering Genes

  • Kim Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.113-123
    • /
    • 2006
  • Normal mixture model(NMM) frequently used to cluster genes on microarray gene expression data. In this paper some of component means of NMM are modelled by a linear regression model so that its design matrix presents the pattern between sample classes in microarray matrix. This modelling for the component means by given design matrices certainly has an advantage that we can lead the clusters that are previously designed. However, it suffers from 'overfitting' problem because in practice genes often are highly dimensional. This problem also arises when the NMM restricted by the linear model for component-means is fitted. To cope with this problem, in this paper, the use of the factor analyzer NMM restricted by linear model is proposed to cluster genes. Also several design matrices which are useful for clustering genes are provided.

Clustering Observations for Detecting Multiple Outliers in Regression Models

  • Seo, Han-Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.503-512
    • /
    • 2012
  • Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.

Design of improved Mulit-FNN for Nonlinear Process modeling

  • Park, Hosung;Sungkwun Oh
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.102.2-102
    • /
    • 2002
  • In this paper, the improved Multi-FNN (Fuzzy-Neural Networks) model is identified and optimized using HCM (Hard C-Means) clustering method and optimization algorithms. The proposed Multi-FNN is based on FNN and use simplified and linear inference as fuzzy inference method and error back propagation algorithm as learning rules. We use a HCM clustering and genetic algorithms (GAs) to identify both the structure and the parameters of a Multi-FNN model. Here, HCM clustering method, which is carried out for the process data preprocessing of system modeling, is utilized to determine the structure of Multi-FNN according to the divisions of input-output space using I/O process data. Also, the parame...

  • PDF

Efficient Continuous Vocabulary Clustering Modeling for Tying Model Recognition Performance Improvement (공유모델 인식 성능 향상을 위한 효율적인 연속 어휘 군집화 모델링)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.1
    • /
    • pp.177-183
    • /
    • 2010
  • In continuous vocabulary recognition system by statistical method vocabulary recognition to be performed using probability distribution it also modeling using phoneme clustering for based sample probability parameter presume. When vocabulary search that low recognition rate problem happened in express vocabulary result from presumed probability parameter by not defined phoneme and insert phoneme and it has it's bad points of gaussian model the accuracy unsecure for one clustering modeling. To improve suggested probability distribution mixed gaussian model to optimized for based resemble Euclidean and Bhattacharyya distance measurement method mixed clustering modeling that system modeling for be searching phoneme probability model in clustered model. System performance as a result of represent vocabulary dependence recognition rate of 98.63%, vocabulary independence recognition rate of 97.91%.

A Novel Multi-Path Routing Algorithm Based on Clustering for Wireless Mesh Networks

  • Liu, Chun-Xiao;Zhang, Yan;Xu, E;Yang, Yu-Qiang;Zhao, Xu-Hui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.4
    • /
    • pp.1256-1275
    • /
    • 2014
  • As one of the new self-organizing and self-configuration broadband networks, wireless mesh networks are being increasingly attractive. In order to solve the load balancing problem in wireless mesh networks, this paper proposes a novel multi-path routing algorithm based on clustering (Cluster_MMesh) for wireless mesh networks. In the clustering stage, on the basis of the maximum connectivity clustering algorithm and k-hop clustering algorithm, according to the idea of maximum connectivity, a new concept of node connectivity degree is proposed in this paper, which can make the selection of cluster head more simple and reasonable. While clustering, the node which has less expected load in the candidate border gateway node set will be selected as the border gateway node. In the multi-path routing establishment stage, we use the intra-clustering multi-path routing algorithm and inter-clustering multi-path routing algorithm to establish multi-path routing from the source node to the destination node. At last, in the traffic allocation stage, we will use the virtual disjoint multi-path model (Vdmp) to allocate the network traffic. Simulation results show that the Cluster_MMesh routing algorithm can help increase the packet delivery rate, reduce the average end to end delay, and improve the network performance.

Clustering properties and halo occupation of Lyman-break galaxies at z ~ 4

  • Park, Jaehong;Kim, Han-Seek;Wyithe, Stuart B.;Lacey, Cedric G.;Baugh, Carlton M.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.40 no.1
    • /
    • pp.59.3-60
    • /
    • 2015
  • We investigate the clustering properties of Lyman-break galaxies (LBGs) at z ~ 4. Using the hierarchical galaxy formation model GALFORM, we predict the angular correlation function (ACF) of LBGs and compare this with the measured ACF from combined survey fields consisting of the Hubble eXtreme Deep Field (XDF) and CANDELS. We find that the predicted ACF is in a good agreement with the measured ACFs. However, when we divide the model LBGs into bright and faint subset, the predicted ACFs are less consistent with observations. We quantify the dependence of clustering on luminosity and show that the fraction of satellite LBGs is important for determining the amplitude of ACF at small scales. We find that central LBGs predominantly reside in ${\sim}10^{11}h^{-1}M_{solar}$ haloes and satellites reside in haloes of mass ${\sim}10^{12}-10^{13}h^{-1}M_{solar}$. The model predicts fewer bright satellite LBGs than is inferred from the observation. LBGs in the tails of the redshift distribution contribute significant additional clustering signal, especially on small scales. This spurious clustering may affect the interpretation of the halo occupation distribution, including the minimum halo mass and abundance of satellite LBGs.

  • PDF