• Title/Summary/Keyword: clustering model

Search Result 1,217, Processing Time 0.038 seconds

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.6
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering

  • Chang, Jeong-Ho;Chi, Sung Wook;Zhang, Byoung Tak
    • Genomics & Informatics
    • /
    • v.1 no.1
    • /
    • pp.32-39
    • /
    • 2003
  • We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycle­regulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.

Normal Mixture Model with General Linear Regressive Restriction: Applied to Microarray Gene Clustering

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.205-213
    • /
    • 2007
  • In this paper, the normal mixture model subjected to general linear restriction for component-means based on linear regression is proposed, and its fitting method by EM algorithm and Lagrange multiplier is provided. This model is applied to gene clustering of microarray expression data, which demonstrates it has very good performances for real data set. This model also allows to obtain the clusters that an analyst wants to find out in the fashion that the hypothesis for component-means is represented by the design matrices and the linear restriction matrices.

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

Color Data Clustering Algorithm using Fuzzy Color Model (퍼지컬러 모델을 이용한 컬러 데이터 클러스터링 알고리즘1)

  • Kim, Dae-Won;Lee, Kwang H.
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.05a
    • /
    • pp.119-122
    • /
    • 2002
  • The research Interest of this paper is focused on the efficient clustering task for an arbitrary color data. In order to tackle this problem, we have tiled to model the inherent uncertainty and vagueness of color data using fuzzy color model. By laking a fuzzy approach to color modeling, we could make a soft decision for the vague regions between neighboring colors. The proposed fuzzy color model defined a three dimensional fuzzy color ball and color membership computation method with the two inter-color distance measures. With the fuzzy color model, we developed a new fuzzy clustering algorithm for an efficient partition of color data. Each fuzzy cluster set has a cluster prototype which is represented by fuzzy color centroid.

  • PDF

Bayesian Curve Clustering in Microarray

  • Lee, Kyeong-Eun;Mallick, Bani K.
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.39-42
    • /
    • 2006
  • We propose a Bayesian model-based approach using a mixture of Dirichlet processes model with discrete wavelet transform, for curve clustering in the microarray data with time-course gene expressions.

  • PDF

Neutron clustering in Monte Carlo iterated-source calculations

  • Sutton, Thomas M.;Mittal, Anudha
    • Nuclear Engineering and Technology
    • /
    • v.49 no.6
    • /
    • pp.1211-1218
    • /
    • 2017
  • Monte Carlo neutron transport codes generally use the method of successive generations to converge the fission source distribution to-and then maintain it at-the fundamental mode. Recently, a phenomenon called "clustering" has been noted, which produces fission distributions that are very far from the fundamental mode. In this study, a mathematical model of clustering in Monte Carlo has been developed. The model draws on previous work for continuous-time birth-death processes, as well as methods from the field of population genetics.

Modeling of Self-Constructed Clustering and Performance Evaluation (자기-구성 클러스터링의 모델링 및 성능평가)

  • Ryu Jeong woong;Kim Sung Suk;Song Chang kyu;Kim Sung Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.6C
    • /
    • pp.490-496
    • /
    • 2005
  • In this paper, we propose a self-constructed clustering algorithm based on inference information of the fuzzy model. This method makes it possible to automatically detect and optimize the number of cluster and parameters by using input-output data. The propose method improves the performance of clustering by extended supervised learning technique. This technique uses the output information as well as input characteristics. For effect the similarity measure in clustering, we use the TSK fuzzy model to sent the information of output. In the conceptually, we design a learning method that use to feedback the information of output to the clustering since proposed algorithm perform to separate each classes in input data space. We show effectiveness of proposed method using simulation than previous ones

Identification of Multi-Fuzzy Model by means of HCM Clustering and Genetic Algorithms (HCM 클러스터링과 유전자 알고리즘을 이용한 다중 퍼지 모델 동정)

  • 박호성;오성권
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.370-370
    • /
    • 2000
  • In this paper, we design a Multi-Fuzzy model by means of HCM clustering and genetic algorithms for a nonlinear system. In order to determine structure of the proposed Multi-Fuzzy model, HCM clustering method is used. The parameters of membership function of the Multi-Fuzzy ate identified by genetic algorithms. A aggregate performance index with a weighting factor is used to achieve a sound balance between approximation and generalization abilities of the model. We use simplified inference and linear inference as inference method of the proposed Multi-Fuzzy mode] and the standard least square method for estimating consequence parameters of the Multi-Fuzzy. Finally, we use some of numerical data to evaluate the proposed Multi-Fuzzy model and discuss about the usefulness.

  • PDF