• 제목/요약/키워드: model-based clustering

검색결과 754건 처리시간 0.096초

A Bayesian Model-based Clustering with Dissimilarities

  • Oh, Man-Suk;Raftery, Adrian
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 한국통계학회 2003년도 추계 학술발표회 논문집
    • /
    • pp.9-14
    • /
    • 2003
  • A Bayesian model-based clustering method is proposed for clustering objects on the basis of dissimilarites. This combines two basic ideas. The first is that tile objects have latent positions in a Euclidean space, and that the observed dissimilarities are measurements of the Euclidean distances with error. The second idea is that the latent positions are generated from a mixture of multivariate normal distributions, each one corresponding to a cluster. We estimate the resulting model in a Bayesian way using Markov chain Monte Carlo. The method carries out multidimensional scaling and model-based clustering simultaneously, and yields good object configurations and good clustering results with reasonable measures of clustering uncertainties. In the examples we studied, the clustering results based on low-dimensional configurations were almost as good as those based on high-dimensional ones. Thus tile method can be used as a tool for dimension reduction when clustering high-dimensional objects, which may be useful especially for visual inspection of clusters. We also propose a Bayesian criterion for choosing the dimension of the object configuration and the number of clusters simultaneously. This is easy to compute and works reasonably well in simulations and real examples.

  • PDF

Robustness Analysis of a Novel Model-Based Recommendation Algorithms in Privacy Environment

  • Ihsan Gunes
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권5호
    • /
    • pp.1341-1368
    • /
    • 2024
  • The concept of privacy-preserving collaborative filtering (PPCF) has been gaining significant attention. Due to the fact that model-based recommendation methods with privacy are more efficient online, privacy-preserving memory-based scheme should be avoided in favor of model-based recommendation methods with privacy. Several studies in the current literature have examined ant colony clustering algorithms that are based on non-privacy collaborative filtering schemes. Nevertheless, the literature does not contain any studies that consider privacy in the context of ant colony clustering-based CF schema. This study employed the ant colony clustering model-based PPCF scheme. Attacks like shilling or profile injection could potentially be successful against privacy-preserving model-based collaborative filtering techniques. Afterwards, the scheme's robustness was assessed by conducting a shilling attack using six different attack models. We utilize masked data-based profile injection attacks against a privacy-preserving ant colony clustering-based prediction algorithm. Subsequently, we conduct extensive experiments utilizing authentic data to assess its robustness against profile injection attacks. In addition, we evaluate the resilience of the ant colony clustering model-based PPCF against shilling attacks by comparing it to established PPCF memory and model-based prediction techniques. The empirical findings indicate that push attack models exerted a substantial influence on the predictions, whereas nuke attack models demonstrated limited efficacy.

Model-based Clustering of DOA Data Using von Mises Mixture Model for Sound Source Localization

  • Dinh, Quang Nguyen;Lee, Chang-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제13권1호
    • /
    • pp.59-66
    • /
    • 2013
  • In this paper, we propose a probabilistic framework for model-based clustering of direction of arrival (DOA) data to obtain stable sound source localization (SSL) estimates. Model-based clustering has been shown capable of handling highly overlapped and noisy datasets, such as those involved in DOA detection. Although the Gaussian mixture model is commonly used for model-based clustering, we propose use of the von Mises mixture model as more befitting circular DOA data than a Gaussian distribution. The EM framework for the von Mises mixture model in a unit hyper sphere is degenerated for the 2D case and used as such in the proposed method. We also use a histogram of the dataset to initialize the number of clusters and the initial values of parameters, thereby saving calculation time and improving the efficiency. Experiments using simulated and real-world datasets demonstrate the performance of the proposed method.

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

  • Yoon, Yeo-Chan;Lee, Junwoo;Park, So-Young;Lee, Changki
    • ETRI Journal
    • /
    • 제39권4호
    • /
    • pp.443-454
    • /
    • 2017
  • In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

Curve Clustering in Microarray

  • Lee, Kyeong-Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권3호
    • /
    • pp.575-584
    • /
    • 2004
  • We propose a Bayesian model-based approach using a mixture of Dirichlet processes model with discrete wavelet transform, for curve clustering in the microarray data with time-course gene expressions.

  • PDF

Neuro-Fuzzy Modeling based on Self-Organizing Clustering (자기구성 클러스터링 기반 뉴로-퍼지 모델링)

  • Kim Sung-Suk;Ryu Jeong-Woong;Kim Yong-Tae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제15권6호
    • /
    • pp.688-694
    • /
    • 2005
  • In this Paper, we Propose a new neuro-fuzzy modeling using clustering-based learning method. In the proposed clustering method, number of clusters is automatically inferred and its parameters are optimized simultaneously, Also, a neuro-fuzzy model is learned based on clustering information at same time. In the previous modelling method, clustering and model learning are performed independently and have no exchange of its informations. However, in the proposed method, overall neuro-fuzzy model is generated by using both clustering and model learning, and the information of modelling output is used to clustering of input. The proposed method improve the computational load of modeling using Subtractive clustering method. Simulation results show that the proposed method has an effectiveness compared with the previous methods.

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • 제43권1호
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

Determining on Model-based Clusters of Time Series Data (시계열데이터의 모델기반 클러스터 결정)

  • Jeon, Jin-Ho;Lee, Gye-Sung
    • The Journal of the Korea Contents Association
    • /
    • 제7권6호
    • /
    • pp.22-30
    • /
    • 2007
  • Most real word systems such as world economy, stock market, and medical applications, contain a series of dynamic and complex phenomena. One of common methods to understand these systems is to build a model and analyze the behavior of the system. In this paper, we investigated methods for best clustering over time series data. As a first step for clustering, BIC (Bayesian Information Criterion) approximation is used to determine the number of clusters. A search technique to improve clustering efficiency is also suggested by analyzing the relationship between data size and BIC values. For clustering, two methods, model-based and similarity based methods, are analyzed and compared. A number of experiments have been performed to check its validity using real data(stock price). BIC approximation measure has been confirmed that it suggests best number of clusters through experiments provided that the number of data is relatively large. It is also confirmed that the model-based clustering produces more reliable clustering than similarity based ones.

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • 제27권6호
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering

  • Chang, Jeong-Ho;Chi, Sung Wook;Zhang, Byoung Tak
    • Genomics & Informatics
    • /
    • 제1권1호
    • /
    • pp.32-39
    • /
    • 2003
  • We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycle­regulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.