• Title/Summary/Keyword: k-means

Search Result 17,892, Processing Time 0.046 seconds

An Implementation of Clustering Method using K-Means Algorithm on Multi-Dimensional Data (K-Means 알고리즘을 이용한 다차원 데이터 클러스터링 기법 구현)

  • Ihm, Sun-Young;Shin, HyunSoon;Park, Young-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.1132-1134
    • /
    • 2013
  • K-Means 클러스터링 기법은 데이터마이닝 분야 중 클러스터링 분야에서 가장 널리 쓰이는 방법 중 하나로 주어진 데이터 셋에서 k개의 클러스터를 중심으로 데이터를 분할하는 기법이다. 최근의 데이터는 여러개의 속성을 고려해야 한다. 따라서 본 논문에서는 K-Means 클러스터링 기법을 소개하고, 또 K-Means 클러스터링 기법을 여러 개의 속성을 고려하기 위하여 다차원 데이터에 적용한 실험을 소개한다.

Bayesian One-Sided Testing for the Ratio of Poisson Means

  • Kang, Sang-Gil;Kim, Dal-Ho;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.619-631
    • /
    • 2006
  • When X and Y have independent Poisson distributions, we develop a Bayesian one-sided testing procedures for the ratio of two Poisson means. We propose the objective Bayesian one-sided testing procedures for the ratio of two Poisson means based on the fractional Bayes factor and the intrinsic Bayes factor. Some real examples are provided.

  • PDF

Bayesian Hypothesis Testing for the Ratio of Exponential Means

  • Kang, Sang-Gil;Kim, Dal-Ho;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1387-1395
    • /
    • 2006
  • This paper considers testing for the ratio of two exponential means. We propose a solution based on a Bayesian decision rule to this problem in which no subjective input is considered. The criterion for testing is the Bayesian reference criterion (Bernardo, 1999). We derive the Bayesian reference criterion for testing the ratio of two exponential means. Simulation study and a real data example are provided.

  • PDF

Bayesian One-Sided Testing for the Ratio of Poisson Means

  • Kang, Sang-Gil;Kim, Dal-Ho;Lee, Woo-Dong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.295-306
    • /
    • 2006
  • When X and Y have independent Poisson distributions, we develop a Bayesian one-sided testing procedures for the ratio of two Poisson means. We propose the objective Bayesian one-sided testing procedures for the ratio of two Poisson means based on the fractional Bayes factor and the intrinsic Bayes factor. Some real examples are provided.

  • PDF

Approximate moments of a variance estimate with imputed conditional means

  • Kang Woo Ram;Shin Min Woong;Lee Sang Eum
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.179-184
    • /
    • 2001
  • Schafer and Shenker(2000) mentioned the one of analytic imputation technique involving conditional means. We derive an approximate moments of a variance estimate with imputed conditional means.

  • PDF

A Study of Similar Blog Recommendation System Using Termite Colony Algorithm (흰개미 군집 알고리즘을 이용한 유사 블로그 추천 시스템에 관한 연구)

  • Jeong, Gi Sung;Jo, I-Seok;Lee, Malrey
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.83-88
    • /
    • 2013
  • This paper proposes a recommending system of the similar blogs gathered with similarities between blogs according to the similarity, dividing words, for each frequency, that individual blogs have. It improved the algorithm of k-means, using the model of the habits of white ants for better performance of clustering, and showed better performance of clustering as a result of evaluating and comparing with the existing algorithm of k-means as the improved algorithm. The recommending system of similar blog was designed and embodied, using the improved algorithm. TCA can reduce clustering time and the number of moving time for clustering compare with K-means algorithm.

Design and Implementation of Distributed In-Memory DBMS-based Parallel K-Means as In-database Analytics Function (분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현)

  • Kou, Heymo;Nam, Changmin;Lee, Woohyun;Lee, Yongjae;Kim, HyoungJoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2018
  • As data size increase, a single database is not enough to serve current volume of tasks. Since data is partitioned and stored into multiple databases, analysis should also support parallelism in order to increase efficiency. However, traditional analysis requires data to be transferred out of database into nodes where analytic service is performed and user is required to know both database and analytic framework. In this paper, we propose an efficient way to perform K-means clustering algorithm inside the distributed column-based database and relational database. We also suggest an efficient way to optimize K-means algorithm within relational database.

The Document Clustering using Multi-Objective Genetic Algorithms (다목적 유전자 알고리즘을 이용한문서 클러스터링)

  • Lee, Jung-Song;Park, Soon-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.2
    • /
    • pp.57-64
    • /
    • 2012
  • In this paper, the multi-objective genetic algorithm is proposed for the document clustering which is important in the text mining field. The most important function in the document clustering algorithm is to group the similar documents in a corpus. So far, the k-means clustering and genetic algorithms are much in progress in this field. However, the k-means clustering depends too much on the initial centroid, the genetic algorithm has the disadvantage of coming off in the local optimal value easily according to the fitness function. In this paper, the multi-objective genetic algorithm is applied to the document clustering in order to complement these disadvantages while its accuracy is analyzed and compared to the existing algorithms. In our experimental results, the multi-objective genetic algorithm introduced in this paper shows the accuracy improvement which is superior to the k-means clustering(about 20 %) and the general genetic algorithm (about 17 %) for the document clustering.

User's Individuality Preference Recommendation System using Improved k-means Algorithm (개선된 k-means 알고리즘을 적용한 사용자 특성 선호도 추천 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.8
    • /
    • pp.141-148
    • /
    • 2010
  • In mobile terminal recommend service system has general information restrictive recommend that individuality considering to user's information find and recommend. Also it has difficult of accurate information recommend bad points user's not offer individuality information preference recommend service. Therefore this paper is propose user's information individuality preference considering by user's individuality preference recommendation system using improved k-means algorithm. Propose method is correlation coefficients using user's information individuality preference when user's individuality preference recommendation using improved k-means algorithm. Restrictive information recommend to fix a problem, information of restrictive general recommend that user's information individuality preference offer to accurate information recommend. Performance experiment is existing service system as compared to evaluating the effectiveness of precision and recall, performance experiment result is appear to precision 85%, recall 68%.

Comparison of Initial Seeds Methods for K-Means Clustering (K-Means 클러스터링에서 초기 중심 선정 방법 비교)

  • Lee, Shinwon
    • Journal of Internet Computing and Services
    • /
    • v.13 no.6
    • /
    • pp.1-8
    • /
    • 2012
  • Clustering method is divided into hierarchical clustering, partitioning clustering, and more. K-Means algorithm is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. It has disadvantage that the random initial centers cause different result. So, the better choice is to place them as far away as possible from each other. We propose a new method of selecting initial centers in K-Means clustering. This method uses triangle height for initial centers of clusters. After that, the centers are distributed evenly and that result is more accurate than initial cluster centers selected random. It is time-consuming, but can reduce total clustering time by minimizing the number of allocation and recalculation. We can reduce the time spent on total clustering. Compared with the standard algorithm, average consuming time is reduced 38.4%.