• Title/Summary/Keyword: data clustering

Search Result 2,769, Processing Time 0.031 seconds

Semantic Correspondence of Database Schema from Heterogeneous Databases using Self-Organizing Map

  • Dumlao, Menchita F.;Oh, Byung-Joo
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.217-224
    • /
    • 2008
  • This paper provides a framework for semantic correspondence of heterogeneous databases using self- organizing map. It solves the problem of overlapping between different databases due to their different schemas. Clustering technique using self-organizing maps (SOM) is tested and evaluated to assess its performance when using different kinds of data. Preprocessing of database is performed prior to clustering using edit distance algorithm, principal component analysis (PCA), and normalization function to identify the features necessary for clustering.

  • PDF

Simultaneous Approach to Fuzzy Clustering and Quantification of Categorical Data with Missing Values

  • Honda, Katsuhiro;Nakamura, Yoshihito;Ichihashi, Hidetomo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.36-39
    • /
    • 2003
  • This paper proposes a simultaneous application of homogeneity analysis and fuzzy clustering with in complete data. Taking the similarity between the loss of homogeneity in homogeneity analysis and the least squares criterion in principal component analysis into account, the new objective function is defined in a similar formulation to the linear fuzzy clustering with missing values. Numerical experiment shows the characteristic properties of the proposed method.

  • PDF

Clustering fMRI Time Series using Self-Organizing Map (자기 조직 신경망을 이용한 기능적 뇌영상 시계열의 군집화)

  • 임종윤;장병탁;이경민
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.251-254
    • /
    • 2001
  • 본 논문에서는 Self Organizing Map을 이용하여 fMRI data를 분석해 보았다. fMRl (functional Magnetic Resonance Imaging)는 인간의 뇌에 대한 비 침투적 연구 방법 중 최근에 각광받고 있는 것이다. Motor task를 수행하고 있는 피험자로부터 image data를 얻어내어 SOM을 적용하여 clustering한 결과 motor cortex 영역이 뚜렷하게 clustering 되었음을 알 수 있었다.

  • PDF

On the Categorical Variable Clustering

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.2
    • /
    • pp.219-226
    • /
    • 1996
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, variable clustering was conducted based on some similarity measures between variables which have binary characteristics. We propose a variable clustering method when variables have more categories ordered in some sense. We also consider some measures of association as a similarity between variables. Numerical example is included.

  • PDF

The Design of GA-based TSK Fuzzy Classifier and Its application (GA기반 TSK 퍼지 분류기의 설계 및 응용)

  • 곽근창;김승석;유정웅;전명근
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.233-236
    • /
    • 2001
  • In this paper, we propose a TSK-type fuzzy classifier using PCA(Principal Component Analysis), FCM(Fuzzy C-Means) clustering and hybrid GA(genetic algorithm). First, input data is transformed to reduce correlation among the data components by PCA. FCM clustering is applied to obtain a initial TSK-type fuzzy classifier. Parameter identification is performed by AGA(Adaptive Genetic Algorithm) and RLSE(Recursive Least Square Estimate). we applied the proposed method to Iris data classification problems and obtained a better performance than previous works.

  • PDF

K-means clustering using a center of gravity for grid-based sample (그리드 기반 표본의 무게중심을 이용한 케이-평균군집화)

  • Lee, Sun-Myung;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.121-128
    • /
    • 2010
  • K-means clustering is an iterative algorithm in which items are moved among sets of clusters until the desired set is reached. K-means clustering has been widely used in many applications, such as market research, pattern analysis or recognition, image processing, etc. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters that we want, because it is more primitive, explorative. In this paper we propose a new method of k-means clustering using a center of gravity for grid-based sample. It is more fast than any traditional clustering method and maintains its accuracy.

The Design of Granular-based Radial Basis Function Neural Network by Context-based Clustering (Context-based 클러스터링에 의한 Granular-based RBF NN의 설계)

  • Park, Ho-Sung;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.6
    • /
    • pp.1230-1237
    • /
    • 2009
  • In this paper, we develop a design methodology of Granular-based Radial Basis Function Neural Networks(GRBFNN) by context-based clustering. In contrast with the plethora of existing approaches, here we promote a development strategy in which a topology of the network is predominantly based upon a collection of information granules formed on a basis of available experimental data. The output space is granulated making use of the K-Means clustering while the input space is clustered with the aid of a so-called context-based fuzzy clustering. The number of information granules produced for each context is adjusted so that we satisfy a certain reconstructability criterion that helps us minimize an error between the original data and the ones resulting from their reconstruction involving prototypes of the clusters and the corresponding membership values. In contrast to "standard" Radial Basis Function neural networks, the output neuron of the network exhibits a certain functional nature as its connections are realized as local linear whose location is determined by the values of the context and the prototypes in the input space. The other parameters of these local functions are subject to further parametric optimization. Numeric examples involve some low dimensional synthetic data and selected data coming from the Machine Learning repository.

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

  • Shin, Hyung-Won;Sohn, So-Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.1
    • /
    • pp.47-53
    • /
    • 2001
  • In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.

  • PDF

Hierarchical Clustering of Symbolic Objects based on Asymmetric Proximity (비대칭적 유사도 기반의 심볼릭 객체의 계층적 클러스터링)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.729-734
    • /
    • 2012
  • Clustering analysis has been widely used in numerous applications like pattern recognition, data analysis, intrusion detection, image processing, bioinformatics and so on. Much of previous work has been based on the numeric data only. However, symbolic data analysis has emerged to deal with variables that can have intervals, histograms, and even functions as values. In this paper, we propose a non symmetric proximity based clustering approach for symbolic objects. A method for clustering symbolic patterns based on the average similarity value(ASV) is explored. The results of the proposed clustering method differ from those of the existing methods and the results are very encouraging.

Model-based Clustering of DOA Data Using von Mises Mixture Model for Sound Source Localization

  • Dinh, Quang Nguyen;Lee, Chang-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.59-66
    • /
    • 2013
  • In this paper, we propose a probabilistic framework for model-based clustering of direction of arrival (DOA) data to obtain stable sound source localization (SSL) estimates. Model-based clustering has been shown capable of handling highly overlapped and noisy datasets, such as those involved in DOA detection. Although the Gaussian mixture model is commonly used for model-based clustering, we propose use of the von Mises mixture model as more befitting circular DOA data than a Gaussian distribution. The EM framework for the von Mises mixture model in a unit hyper sphere is degenerated for the 2D case and used as such in the proposed method. We also use a histogram of the dataset to initialize the number of clusters and the initial values of parameters, thereby saving calculation time and improving the efficiency. Experiments using simulated and real-world datasets demonstrate the performance of the proposed method.