• Title/Summary/Keyword: Modified K-means clustering

Search Result 60, Processing Time 0.024 seconds

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.4
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

Inverted Index based Modified Version of K-Means Algorithm for Text Clustering

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.2
    • /
    • pp.67-76
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of k means algorithm to be adaptable to string vectors for text clustering. Traditionally, when k means algorithm is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the k means algorithm adaptable to string vectors for text clustering.

An Implementation of K-Means Algorithm Improving Cluster Centroids Decision Methodologies (클러스터 중심 결정 방법을 개선한 K-Means 알고리즘의 구현)

  • Lee Shin-Won;Oh HyungJin;An Dong-Un;Jeong Seong-Jong
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.867-874
    • /
    • 2004
  • K-Means algorithm is a non-hierarchical (plat) and reassignment techniques and iterates algorithm steps on the basis of K cluster centroids until the clustering results converge into K clusters. In its nature, K-Means algorithm has characteristics which make different results depending on the initial and new centroids. In this paper, we propose the modified K-Means algorithm which improves the initial and new centroids decision methodologies. By evaluating the performance of two algorithms using the 16 weighting scheme of SMART system, the modified algorithm showed $20{\%}$ better results on recall and F-measure than those of K-Means algorithm, and the document clustering results are quite improved.

Data Clustering Method Using a Modified Gaussian Kernel Metric and Kernel PCA

  • Lee, Hansung;Yoo, Jang-Hee;Park, Daihee
    • ETRI Journal
    • /
    • v.36 no.3
    • /
    • pp.333-342
    • /
    • 2014
  • Most hyper-ellipsoidal clustering (HEC) approaches use the Mahalanobis distance as a distance metric. It has been proven that HEC, under this condition, cannot be realized since the cost function of partitional clustering is a constant. We demonstrate that HEC with a modified Gaussian kernel metric can be interpreted as a problem of finding condensed ellipsoidal clusters (with respect to the volumes and densities of the clusters) and propose a practical HEC algorithm that is able to efficiently handle clusters that are ellipsoidal in shape and that are of different size and density. We then try to refine the HEC algorithm by utilizing ellipsoids defined on the kernel feature space to deal with more complex-shaped clusters. The proposed methods lead to a significant improvement in the clustering results over K-means algorithm, fuzzy C-means algorithm, GMM-EM algorithm, and HEC algorithm based on minimum-volume ellipsoids using Mahalanobis distance.

Pruning Methodology for Reducing the Size of Speech DB for Corpus-based TTS Systems (코퍼스 기반 음성합성기의 데이터베이스 축소 방법)

  • 최승호;엄기완;강상기;김진영
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.703-710
    • /
    • 2003
  • Because of their human-like synthesized speech quality, recently Corpus-Based Text-To-Speech(CB-TTS) have been actively studied worldwide. However, due to their large size speech database (DB), their application is very restricted. In this paper we propose and evaluate three DB reduction algorithms to which are designed to solve the above drawback. The first method is based on a K-means clustering approach, which selects k-representatives among multiple instances. The second method is keeping only those unit instances that are selected during synthesis, using a domain-restricted text as input to the synthesizer. The third method is a kind of hybrid approach of the above two methods and is using a large text as input in the system. After synthesizing the given sentences, the used unit instances and their occurrence information is extracted. As next step a modified K-means clustering is applied, which takes into account also the occurrence information of the selected unit instances, Finally we compare three pruning methods by evaluating the synthesized speech quality for the similar DB reduction rate, Based on perceptual listening tests, we concluded that the last method shows the best performance among three algorithms. More than this, the results show that the last method is able to reduce DB size without speech quality looses.

Zone Clustering Using a Genetic Algorithm and K-Means (유전자 알고리듬과 K-평균법을 이용한 지역 분할)

  • 임동순;오현승
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.23 no.1
    • /
    • pp.1-16
    • /
    • 1998
  • The zone clustering problem arising from several area such as deciding the optimal location of ambient measuring stations is to devide the 2-dimensional area into several sub areas in which included individual zone shows simimlar properties. In general, the optimal solution of this problem is very hard to obtain. Therefore, instead of finding an optimal solution, the generation of near optimal solution within the limited time is more meaningful. In this study, the combination of a genetic algorithm and the modified k-means method is used to obtain the near optimal solution. To exploit the genetic algorithm effectively, a representation of chromsomes and appropriate genetic operators are proposed. The k-means method which is originally devised to solve the object clustering problem is modified to improve the solutions obtained from the genetic algorithm. The experiment shows that the proposed method generates the near optimal solution efficiently.

  • PDF

Clustering Gene Expression Data by MCL Algorithm (MCL 알고리즘을 사용한 유전자 발현 데이터 클러스터링)

  • Shon, Ho-Sun;Ryu, Keun-Ho
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.4
    • /
    • pp.27-33
    • /
    • 2008
  • The clustering of gene expression data is used to analyze the results of microarray studies. This clustering is one of the frequently used methods in understanding degrees of biological change and gene expression. In biological research, MCL algorithm is an algorithm that clusters nodes within a graph, and is quick and efficient. We have modified the existing MCL algorithm and applied it to microarray data. In applying the MCL algorithm we put forth a simulation that adjusts two factors, namely inflation and diagonal tent and converted them by making use of Markov matrix. Furthermore, in order to distinguish class more clearly in the modified MCL algorithm we took the average of each row and used it as a threshold. Therefore, the improved algorithm can increase accuracy better than the existing ones. In other words, in the actual experiment, it showed an average of 70% accuracy when compared with an existing class. We also compared the MCL algorithm with the self-organizing map(SOM) clustering, K-means clustering and hierarchical clustering (HC) algorithms. And the result showed that it showed better results than ones derived from hierarchical clustering and K-means method.

Clustering Algorithm for Time Series with Similar Shapes

  • Ahn, Jungyu;Lee, Ju-Hong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.3112-3127
    • /
    • 2018
  • Since time series clustering is performed without prior information, it is used for exploratory data analysis. In particular, clusters of time series with similar shapes can be used in various fields, such as business, medicine, finance, and communications. However, existing time series clustering algorithms have a problem in that time series with different shapes are included in the clusters. The reason for such a problem is that the existing algorithms do not consider the limitations on the size of the generated clusters, and use a dimension reduction method in which the information loss is large. In this paper, we propose a method to alleviate the disadvantages of existing methods and to find a better quality of cluster containing similarly shaped time series. In the data preprocessing step, we normalize the time series using z-transformation. Then, we use piecewise aggregate approximation (PAA) to reduce the dimension of the time series. In the clustering step, we use density-based spatial clustering of applications with noise (DBSCAN) to create a precluster. We then use a modified K-means algorithm to refine the preclusters containing differently shaped time series into subclusters containing only similarly shaped time series. In our experiments, our method showed better results than the existing method.

Fast Outlier Removal for Image Registration based on Modified K-means Clustering

  • Soh, Young-Sung;Qadir, Mudasar;Kim, In-Taek
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.1
    • /
    • pp.9-14
    • /
    • 2015
  • Outlier detection and removal is a crucial step needed for various image processing applications such as image registration. Random Sample Consensus (RANSAC) is known to be the best algorithm so far for the outlier detection and removal. However RANSAC requires a cosiderable computation time. To drastically reduce the computation time while preserving the comparable quality, a outlier detection and removal method based on modified K-means is proposed. The original K-means was conducted first for matching point pairs and then cluster merging and member exclusion step are performed in the modification step. We applied the methods to various images with highly repetitive patterns under several geometric distortions and obtained successful results. We compared the proposed method with RANSAC and showed that the proposed method runs 3~10 times faster than RANSAC.

SUPPORT VECTOR MACHINE USING K-MEANS CLUSTERING

  • Lee, S.J.;Park, C.;Jhun, M.;Koo, J.Y.
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.1
    • /
    • pp.175-182
    • /
    • 2007
  • The support vector machine has been successful in many applications because of its flexibility and high accuracy. However, when a training data set is large or imbalanced, the support vector machine may suffer from significant computational problem or loss of accuracy in predicting minority classes. We propose a modified version of the support vector machine using the K-means clustering that exploits the information in class labels during the clustering process. For large data sets, our method can save the computation time by reducing the number of data points without significant loss of accuracy. Moreover, our method can deal with imbalanced data sets effectively by alleviating the influence of dominant class.