• 제목/요약/키워드: clustering model

검색결과 1,218건 처리시간 0.035초

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • 제27권6호
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering

  • Chang, Jeong-Ho;Chi, Sung Wook;Zhang, Byoung Tak
    • Genomics & Informatics
    • /
    • 제1권1호
    • /
    • pp.32-39
    • /
    • 2003
  • We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycle­regulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.

Normal Mixture Model with General Linear Regressive Restriction: Applied to Microarray Gene Clustering

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • 제14권1호
    • /
    • pp.205-213
    • /
    • 2007
  • In this paper, the normal mixture model subjected to general linear restriction for component-means based on linear regression is proposed, and its fitting method by EM algorithm and Lagrange multiplier is provided. This model is applied to gene clustering of microarray expression data, which demonstrates it has very good performances for real data set. This model also allows to obtain the clusters that an analyst wants to find out in the fashion that the hypothesis for component-means is represented by the design matrices and the linear restriction matrices.

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권7호
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

지식 분류의 자동화를 위한 클러스터링 모형 연구 (Development of a Clustering Model for Automatic Knowledge Classification)

  • 정영미;이재윤
    • 정보관리학회지
    • /
    • 제18권2호
    • /
    • pp.203-230
    • /
    • 2001
  • 본 연구에서는 문헌을 기반으로 한 지식의 자동분류를 위해 최적의 클러스터링 모형을 제시하고자 하였다. 클러스터링 실험을 위해서 신문기사 실험집단과 학술논문 초록 실험집단을 구축하였고, 분류 성능 평가 척도인 WACS를 개발하였다. 분류자질로 사용한 용어의 집합은 다양한 자질 축소 기준을 적용하여 생성하였으며, 다양한 용어 가중치를 사용하였다. 유사계수 공식으로는 코사인 계수와 자카드 계수를 적용하였으며, 클러스터링 알고리즘으로는 비계층적 기법인 완전연결 기법과 계층적 기법인 K-means기법을 각각 사용하였다. 실험 결과 신문기사 원문 집단에서의 성능이 좋았으며, 완전연결 기법의 성능이 K-means 기법보다 높게 나타났다. 역문헌빈도의 적용은 완전연결 클러스터링에서는 긍정적인 효과가 나타났으나, K-means 클러스터링에서는 그렇지 못했다. 분류자질은 전체의 7.66%만 사용하였을 경우에도 성능 저하가 크지 않았으며, K-means 클러스터링에서는 오히려 성능 향상 효과가 있었다.

  • PDF

퍼지컬러 모델을 이용한 컬러 데이터 클러스터링 알고리즘1 (Color Data Clustering Algorithm using Fuzzy Color Model)

  • Kim, Dae-Won;Lee, Kwang H.
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2002년도 춘계학술대회 및 임시총회
    • /
    • pp.119-122
    • /
    • 2002
  • The research Interest of this paper is focused on the efficient clustering task for an arbitrary color data. In order to tackle this problem, we have tiled to model the inherent uncertainty and vagueness of color data using fuzzy color model. By laking a fuzzy approach to color modeling, we could make a soft decision for the vague regions between neighboring colors. The proposed fuzzy color model defined a three dimensional fuzzy color ball and color membership computation method with the two inter-color distance measures. With the fuzzy color model, we developed a new fuzzy clustering algorithm for an efficient partition of color data. Each fuzzy cluster set has a cluster prototype which is represented by fuzzy color centroid.

  • PDF

Bayesian Curve Clustering in Microarray

  • 이경은
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 PROCEEDINGS OF JOINT CONFERENCEOF KDISS AND KDAS
    • /
    • pp.39-42
    • /
    • 2006
  • We propose a Bayesian model-based approach using a mixture of Dirichlet processes model with discrete wavelet transform, for curve clustering in the microarray data with time-course gene expressions.

  • PDF

Neutron clustering in Monte Carlo iterated-source calculations

  • Sutton, Thomas M.;Mittal, Anudha
    • Nuclear Engineering and Technology
    • /
    • 제49권6호
    • /
    • pp.1211-1218
    • /
    • 2017
  • Monte Carlo neutron transport codes generally use the method of successive generations to converge the fission source distribution to-and then maintain it at-the fundamental mode. Recently, a phenomenon called "clustering" has been noted, which produces fission distributions that are very far from the fundamental mode. In this study, a mathematical model of clustering in Monte Carlo has been developed. The model draws on previous work for continuous-time birth-death processes, as well as methods from the field of population genetics.

자기-구성 클러스터링의 모델링 및 성능평가 (Modeling of Self-Constructed Clustering and Performance Evaluation)

  • 유정웅;김승석;송창규;김성수
    • 한국통신학회논문지
    • /
    • 제30권6C호
    • /
    • pp.490-496
    • /
    • 2005
  • 본 논문에서는 퍼지 추론 시스템의 추론 정보를 이용하여 자율적으로 구조를 결정하는 클러스터링 기법을 제안한다. 제안된 기법은 주어진 입출력 데이터를 이용하여 자율적으로 클러스터의 수를 추정하고 동시에 이들 파라미터를 최적화한다. 일반적인 클러스터링 기법에서 볼 수 있었던 비교사학습을 교사학습으로 확장하여 클러스터 추정에 입출력 인과 관계를 고려한 학습을 실시하게 하여 전체 모델의 성능을 개선하고자 하였다. 출력 정보가 입력공간에서 클러스터링 학습에 적용됨으로써 클러스터링에서의 각 클래스의 구분 작업이 더 원활하게 이루어 질 수 있다. 모의실험을 통하여 기존의 연구 결과와 비교하여 제안된 기법의 유용성을 보인다.

HCM 클러스터링과 유전자 알고리즘을 이용한 다중 퍼지 모델 동정 (Identification of Multi-Fuzzy Model by means of HCM Clustering and Genetic Algorithms)

  • 박호성;오성권
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2000년도 제15차 학술회의논문집
    • /
    • pp.370-370
    • /
    • 2000
  • In this paper, we design a Multi-Fuzzy model by means of HCM clustering and genetic algorithms for a nonlinear system. In order to determine structure of the proposed Multi-Fuzzy model, HCM clustering method is used. The parameters of membership function of the Multi-Fuzzy ate identified by genetic algorithms. A aggregate performance index with a weighting factor is used to achieve a sound balance between approximation and generalization abilities of the model. We use simplified inference and linear inference as inference method of the proposed Multi-Fuzzy mode] and the standard least square method for estimating consequence parameters of the Multi-Fuzzy. Finally, we use some of numerical data to evaluate the proposed Multi-Fuzzy model and discuss about the usefulness.

  • PDF