• Title/Summary/Keyword: Data Clustering

Search Result 2,747, Processing Time 0.033 seconds

An Optimization Approach to Data Clustering

  • Kim, Ju-Mi;Olafsson, Sigurdur
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2005.05a
    • /
    • pp.621-628
    • /
    • 2005
  • Scalability of clustering algorithms is critical issues facing the data mining community. This is particularly true for computationally intense tasks such as data clustering. Random sampling of instances is one possible means of achieving scalability but a pervasive problem with this approach is how to deal with the noise that this introduces in the evaluation of the learning algorithm. This paper develops a new optimization based clustering approach using an algorithms specifically designed for noisy performance. Numerical results illustrate that with this algorithm substantial benefits can be achieved in terms of computational time without sacrificing solution quality.

  • PDF

Polynomial Fuzzy Radial Basis Function Neural Network Classifiers Realized with the Aid of Boundary Area Decision

  • Roh, Seok-Beom;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.6
    • /
    • pp.2098-2106
    • /
    • 2014
  • In the area of clustering, there are numerous approaches to construct clusters in the input space. For regression problem, when forming clusters being a part of the overall model, the relationships between the input space and the output space are essential and have to be taken into consideration. Conditional Fuzzy C-Means (c-FCM) clustering offers an opportunity to analyze the structure in the input space with the mechanism of supervision implied by the distribution of data present in the output space. However, like other clustering methods, c-FCM focuses on the distribution of the data. In this paper, we introduce a new method, which by making use of the ambiguity index focuses on the boundaries of the clusters whose determination is essential to the quality of the ensuing classification procedures. The introduced design is illustrated with the aid of numeric examples that provide a detailed insight into the performance of the fuzzy classifiers and quantify several essentials design aspects.

Fundamental Considerations: Impact of Sensor Characteristics, Application Environments in Wireless Sensor Networks

  • Choi, Dongmin;Chung, Ilyong
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.4
    • /
    • pp.441-457
    • /
    • 2014
  • Observed from the recent performance evaluation of clustering schemes in wireless sensor networks, we found that most of them did not consider various sensor characteristics and its application environment. Without considering these, the performance evaluation results are difficult to be trusted because these networks are application-specific. In this paper, for the fair evaluation, we measured several clustering scheme's performance variations in accordance with sensor data pattern, number of sensors per node, density of points of interest (data density) and sensor coverage. According to the experiment result, we can conclude that clustering methods are easily influenced by POI variation. Network lifetime and data accuracy are also slightly influenced by sensor coverage and number of sensors. Therefore, in the case of the clustering scheme that did not consider various conditions, fair evaluation cannot be expected.

Projection Pursuit K-Means Visual Clustering

  • Kim, Mi-Kyung;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.4
    • /
    • pp.519-532
    • /
    • 2002
  • K-means clustering is a well-known partitioning method of multivariate observations. Recently, the method is implemented broadly in data mining softwares due to its computational efficiency in handling large data sets. However, it does not yield a suitable visual display of multivariate observations that is important especially in exploratory stage of data analysis. The aim of this study is to develop a K-means clustering method that enables visual display of multivariate observations in a low-dimensional space, for which the projection pursuit method is adopted. We propose a computationally inexpensive and reliable algorithm and provide two numerical examples.

On Color Cluster Analysis with Three-dimensional Fuzzy Color Ball

  • Kim, Dae-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.2
    • /
    • pp.262-267
    • /
    • 2008
  • The focus of this paper is on devising an efficient clustering task for arbitrary color data. In order to tackle this problem, the inherent uncertainty and vagueness of color are represented by a fuzzy color model. By taking a fuzzy approach to color representation, the proposed model makes a soft decision for the vague regions between neighboring colors. A definition on a three-dimensional fuzzy color ball is introduced, and the degree of membership of color is computed by employing a distance measure between a fuzzy color and color data. With the fuzzy color model, a novel fuzzy clustering algorithm for efficient partition of color data is developed.

Comparison of Classification Rate Between BP and ANFIS with FCM Clustering Method on Off-line PD Model of Stator Coil

  • Park Seong-Hee;Lim Kee-Joe;Kang Seong-Hwa;Seo Jeong-Min;Kim Young-Geun
    • KIEE International Transactions on Electrophysics and Applications
    • /
    • v.5C no.3
    • /
    • pp.138-142
    • /
    • 2005
  • In this paper, we compared recognition rates between NN(neural networks) and clustering method as a scheme of off-line PD(partial discharge) diagnosis which occurs at the stator coil of traction motor. To acquire PD data, three defective models are made. PD data for classification were acquired from PD detector. And then statistical distributions are calculated to classify model discharge sources. These statistical distributions were applied as input data of two classification tools, BP(Back propagation algorithm) and ANFIS(adaptive network based fuzzy inference system) pre-processed FCM(fuzzy c-means) clustering method. So, classification rate of BP were somewhat higher than ANFIS. But other items of ANFIS were better than BP; learning time, parameter number, simplicity of algorithm.

New Sequential Clustering Combination for Rule Generation System (규칙 생성 시스템을 위한 새로운 연속 클러스터링 조합)

  • Kim, Sung Suk;Choi, Ho Jin
    • Journal of Internet Computing and Services
    • /
    • v.13 no.5
    • /
    • pp.1-8
    • /
    • 2012
  • In this paper, we propose a new clustering combination based on numerical data driven for rule generation mechanism. In large and complicated space, a clustering method can obtain limited performance results. To overcome the single clustering method problem, hybrid combined methods can solve problem to divided simple cluster estimation. Fundamental structure of the proposed method is combined by mountain clustering and modified Chen clustering to extract detail cluster information in complicated data distribution of non-parametric space. It has automatic rule generation ability with advanced density based operation when intelligent systems including neural networks and fuzzy inference systems can be generated by clustering results. Also, results of the mechanism will be served to information of decision support system to infer the useful knowledge. It can extend to healthcare and medical decision support system to help experts or specialists. We show and explain the usefulness of the proposed method using simulation and results.

Local Distribution Based Density Clustering for Speaker Diarization (화자분할을 위한 지역적 특성 기반 밀도 클러스터링)

  • Rho, Jinsang;Shon, Suwon;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.303-309
    • /
    • 2015
  • Speaker diarization is the task of determining the speakers for unlabeled data, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has been widely used in the field of speaker diarization for its simplicity and computational efficiency. One challenging issue, however, is that if different clusters in non-spatial dataset are adjacent to each other, over-clustering may occur which subsequently degrades the performance of DBSCAN. In this paper, we identify the drawbacks of DBSCAN and propose a new density clustering algorithm based on local distribution property around object. Variable density criterions for local density and spreadness of object are used for effective data clustering. We compare the proposed algorithm to DBSCAN in terms of clustering accuracy. Experimental results confirm that the proposed algorithm exhibits higher accuracy than DBSCAN without over-clustering and confirm that the new approach based on local density and object spreadness is efficient.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

A Multi-Dimensional Issue Clustering from the Perspective Consumers' Interests and R&D (소비자 선호 이슈 및 R&D 관점에서의 다차원 이슈 클러스터링)

  • Hyun, Yoonjin;Kim, Namgyu;Cho, Yoonho
    • Journal of Information Technology Services
    • /
    • v.14 no.1
    • /
    • pp.237-249
    • /
    • 2015
  • The volume of unstructured text data generated by various social media has been increasing rapidly; therefore, use of text mining to support decision making has also been increasing. Especially, issue Clustering-determining a new relation with various issues through clustering-has gained attention from many researchers. However, traditional issue clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be discovered using traditional issue clustering methods, even if those issues are strongly related in other perspectives. Therefore, issue clustering that fits each of criteria needs to be performed by the perspective of analysis and the purpose of use. In this study, a multi-dimensional issue clustering is proposed to overcome the limitation of traditional issue clustering. We assert, specifically in this study, that issue clustering should be performed for a particular purpose. We analyze the results of applying our methodology to two specific perspectives on issue clustering, (i) consumers' interests, and (ii) related R&D terms.