• Title/Summary/Keyword: K-means 알고리즘

Search Result 770, Processing Time 0.031 seconds

Nonlinear Process Modeling Using Hard Partition-based Inference System (Hard 분산 분할 기반 추론 시스템을 이용한 비선형 공정 모델링)

  • Park, Keon-Jun;Kim, Yong-Kab
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.7 no.4
    • /
    • pp.151-158
    • /
    • 2014
  • In this paper, we introduce an inference system using hard scatter partition method and model the nonlinear process. To do this, we use the hard scatter partition method that partition the input space in the scatter form with the value of the membership degree of 0 or 1. The proposed method is implemented by C-Means clustering algorithm. and is used for the initial center values by means of binary split. by applying the LBG algorithm to compensate for shortcomings in the sensitive initial center value. Hard-scatter-partitioned input space forms the rules in the rule-based system modeling. The premise parameters of the rules are determined by membership matrix by means of C-Means clustering algorithm. The consequence part of the rules is expressed in the form of polynomial functions and the coefficient parameters of each rule are determined by the standard least-squares method. The data widely used in nonlinear process is used to model the nonlinear process and evaluate the characteristics of nonlinear process.

A Study on Customer rating using RFM and K-Means (RFM 기법과 K-Means 알고리즘을 이용한 고객 분류)

  • Ji, Hyunjung;Shin, Gyeongil;Shin, Dongil;Shin, Dongkyoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.803-806
    • /
    • 2017
  • 고객의 행동을 분석하기 위한 RFM(Recency, Frequency, Monetary)은 마케팅 분양에서 널리 쓰이고 있는 시작분석기법이다. 최근 축적되는 데이터가 많아지면서 이를 활용하기 위해 기계학습에 대한 관심이 증가하였다. 따라서 RFM 기법과 다양한 알고리즘을 결합하여 데이터를 분석하고자 하는 시도가 이루어지고 있다. 본 논문에서는 RFM 기법과 대표적인 클러스터링 알고리즘인 k-means를 통하여 고객을 등급화 하는 방법에 대해 실험하였다. 기존의 실험에서는 k값을 8 혹은 9로 지정하는 사례가 많았다. 그러나 본 실험에서는 내부평가방법을 통해 데이터 셋에 대한 최적의 k값을 구해보았고, 실험 결과 사용한 4개의 데이터 셋에서 3이라는 동일한 결과가 나왔다.

A study on image segmentation for depth map generation (깊이정보 생성을 위한 영상 분할에 관한 연구)

  • Lim, Jae Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.10
    • /
    • pp.707-716
    • /
    • 2017
  • The advances in image display devices necessitate display images suitable for the user's purpose. The display devices should be able to provide object-based image information when a depthmap is required. In this paper, we represent the algorithm using a histogram-based image segmentation method for depthmap generation. In the conventional K-means clustering algorithm, the number of centroids is parameterized, so existing K-means algorithms cannot adaptively determine the number of clusters. Further, the problem of K-means algorithm tends to sink into the local minima, which causes over-segmentation. On the other hand, the proposed algorithm is adaptively able to select centroids and can stand on the basis of the histogram-based algorithm considering the amount of computational complexity. It is designed to show object-based results by preventing the existing algorithm from falling into the local minimum point. Finally, we remove the over-segmentation components through connected-component labeling algorithm. The results of proposed algorithm show object-based results and better segmentation results of 0.017 and 0.051, compared to the benchmark method in terms of Probabilistic Rand Index(PRI) and Segmentation Covering(SC), respectively.

Comparison of Document Clustering algorithm using Genetic Algorithms by Individual Structures (개체 구조에 따른 유전자 알고리즘 기반의 문서 클러스터링 성능 비교)

  • Choi, Lim-Cheon;Song, Wei;Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.16 no.3
    • /
    • pp.47-56
    • /
    • 2011
  • To apply Genetic algorithm toward document clustering, appropriate individual structure is required. Document clustering with the genetic algorithms (DCGA) uses the centroid vector type individual structure. New document clustering with the genetic algorithm (NDAGA) uses document allocated individual structure. In this paper, to find more suitable object structure and process for the document clustering, calculation, amount of calculation, run-time, and performance difference between the two methods were analyzed. In this paper, we have performed various experiments using both DCGA and NDCGA. Result of the experiment shows that compared to DCGA, NDCGA provided 15% faster execution time, about 5~10% better performance. This proves that the document allocated structure is more fitted than the centroid vector type structure when it comes to document clustering. In addition, NDCGA showed 15~25% better performance than the traditional clustering algorithms (K-means, Group Average).

Context-awareness User Analysis based on Clustering Algorithm (클러스터링 알고리즘기반의 상황인식 사용자 분석)

  • Lee, Kang-whan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.942-948
    • /
    • 2020
  • In this paper, we propose a clustered algorithm that possible more efficient user distinction within clustering using context-aware attribute information. In typically, the data provided to classify interrelationships within cluster information in the process of clustering data will be as a degrade factor if new or newly processing information is treated as contaminated information in comparative information. In this paper, we have developed a clustering algorithm that can extract user's recognition information to solve this problem in using K-means algorithm. The proposed algorithm analyzes the user's clustering attributed parameters from user clusters using accumulated information and clustering according to their attributes. The results of the simulation with the proposed algorithm showed that the user management system was more adaptable in terms of classifying and maintaining multiple users in clusters.

An Efficient Method to Compute a Covariance Matrix of the Non-local Means Algorithm for Image Denoising with the Principal Component Analysis (영상 잡음 제거를 위한 주성분 분석 기반 비 지역적 평균 알고리즘의 효율적인 공분산 행렬 계산 방법)

  • Kim, Jeonghwan;Jeong, Jechang
    • Journal of Broadcast Engineering
    • /
    • v.21 no.1
    • /
    • pp.60-65
    • /
    • 2016
  • This paper introduces the non-local means (NLM) algorithm for image denoising, and also introduces an improved algorithm which is based on the principal component analysis (PCA). To do the PCA, a covariance matrix of a given image should be evaluated first. If we let the size of neighborhood patches of the NLM S × S2, and let the number of pixels Q, a matrix multiplication of the size S2 × Q is required to compute a covariance matrix. According to the characteristic of images, such computation is inefficient. Therefore, this paper proposes an efficient method to compute the covariance matrix by sampling the pixels. After sampling, the covariance matrix can be computed with matrices of the size S2 × floor (Width/l) × (Height/l).

Support Vector Data Description using Mean Shift Clustering (평균 이동 알고리즘 기반의 지지 벡터 영역 표현 방법)

  • Chang, Hyung-Jin;Kim, Pyo-Jae;Choi, Jung-Hwan;Choi, Jin-Young
    • Proceedings of the KIEE Conference
    • /
    • 2007.04a
    • /
    • pp.307-309
    • /
    • 2007
  • SVDD의 scale prob1em을 해결하기 위하여, 학습 데이터를 sub-groupings하여 group 단위로 SVDD를 통해 학습함으로써 학습 시간을 줄이는, K-means clustering을 이용한 SVDD 방범(KMSVDD)이 제안되었다. 하지만 KMSVDD는 K-means clustering 알고리즘의 본질상 최적의 K값을 정하기 힘들다는 문제와, 동일한 데이터를 학습할지라도 clustered group이 램덤하게 형성되기 때문에 매번 학습의 결과가 달라지는 문제점이 있었다. 또한 데이터의 분포 상태와 관계없이 무조건 타원(dlliptic) 형태의 K개의 cluster로 나누기 때문에 각각의 나눠진 cluster들은 데이터 분포에 대한 특징을 나타내기 힘들게 된다. 이러한 문제점을 해결하기 위하여 본 논문에서는 데이터 분포에서 mode를 먼저 찾은 후 이 mode를 기준으로 clustering하는 Mean Shift clustering 방법을 이용한 SVDD를 제안하고자 한다. 제안된 알고리즘은 KMSVDD와 비교해 데이터 학습 속도에서는 큰 차이가 없으면서도 데이터의 분포 상태를 고려한 형태로 clustering 한 sub-group을 학습하므로 학습의 정확도가 일정하게 되며, 각각의 cluster는 데이터 분표의 특징을 포함하는 효과가 있다. 또한 Mean Shift Kernel의 bandwidth의 결정은 K-Means의 K와는 달리 어느 정도 여유를 갖고 결정되어도 학습 결과에는 차이가 없다. 다양한 데이터들을 이용한 모의실험을 통하여 위의 내용들을 검증하도록 한다.

  • PDF

Privacy-Preserving k-means Clustering of Encrypted Data (암호화된 데이터에 대한 프라이버시를 보존하는 k-means 클러스터링 기법)

  • Jeong, Yunsong;Kim, Joon Sik;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.6
    • /
    • pp.1401-1414
    • /
    • 2018
  • The k-means clustering algorithm groups input data with the number of groups represented by variable k. In fact, this algorithm is particularly useful in market segmentation and medical research, suggesting its wide applicability. In this paper, we propose a privacy-preserving clustering algorithm that is appropriate for outsourced encrypted data, while exposing no information about the input data itself. Notably, our proposed model facilitates encryption of all data, which is a large advantage over existing privacy-preserving clustering algorithms which rely on multi-party computation over plaintext data stored on several servers. Our approach compares homomorphically encrypted ciphertexts to measure the distance between input data. Finally, we theoretically prove that our scheme guarantees the security of input data during computation, and also evaluate our communication and computation complexity in detail.

Energy Efficient Cluster Routing Method Using Machine Learning in WSN (무선 센서 네트워크에서의 머신러닝을 활용한 에너지 효율적인 클러스터 라우팅 방안 연구)

  • Mi-Young, Kang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.27 no.1
    • /
    • pp.124-130
    • /
    • 2023
  • In this paper, we intend to improve the network lifetime by improving the energy efficiency of sensor nodes in a wireless sensor network by utilizing machine learning using K-means clustering algorithm. A wireless sensor network is a wireless network composed of physical devices including batteries as physical sensors. Due to the characteristics of sensor nodes, all resources must be efficiently used to minimize energy consumption to maximize network lifetime. A cluster based approach is used to manage groups of relatively large numbers of nodes. In the proposed protocol, by improving the existing LEACH algorithm, we propose a clustering algorithm that selects a cluster head using a cluster based approach and a location based approach. The performance results to be improved were measured using Matlab simulation. Through the experimental results, K-means clustering was applied to the energy efficiency part. By utilizing K-means, it is confirmed that energy efficiency is improved and the lifetime of the entire network is extended.

An Algorithms for Tournament-based Big Data Analysis (토너먼트 기반의 빅데이터 분석 알고리즘)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.16 no.4
    • /
    • pp.545-553
    • /
    • 2015
  • While all of the data has a value in itself, most of the data that is collected in the real world is a random and unstructured. In order to extract useful information from the data, it is need to use the data transform and analysis algorithms. Data mining is used for this purpose. Today, there is not only need for a variety of data mining techniques to analyze the data but also need for a computational requirements and rapid analysis time for huge volume of data. The method commonly used to store huge volume of data is to use the hadoop. A method for analyzing data in hadoop is to use the MapReduce framework. In this paper, we developed a tournament-based MapReduce method for high efficiency in developing an algorithm on a single machine to the MapReduce framework. This proposed method can apply many analysis algorithms and we showed the usefulness of proposed tournament based method to apply frequently used data mining algorithms k-means and k-nearest neighbor classification.