• Title/Summary/Keyword: K-mean Clustering

Search Result 280, Processing Time 0.025 seconds

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

  • Frigui, Hichem;Bchir, Ouiem;Baili, Naouel
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.254-268
    • /
    • 2013
  • For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.

A Study on the Gen Expression Data Analysis Using Fuzzy Clustering

  • Choi, Hang-Suk;Cha, Kyung-Joon;Park, Hong-Goo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.25-29
    • /
    • 2005
  • Microarry 기술의 발전은 유전자의 기능과 상호 관련성 그리고 특성을 파악 가능하게 하였으며, 이를 위한 다양한 분석 기법들이 소개되고 있다. 본 연구에서 소개하는 fuzzy clustering 기법은 genome 영역의 expression 분석에 가장 널리 사용되는 기법중 비지도학습(unsupervized) 분석 기법이다. Fuzzy clustering 기법을 효모(yeast) expression 데이터를 이용하여 분류하여 hard k-means와 비교 하였다.

  • PDF

A Study on Data Clustering Method Using Local Probability (국부 확률을 이용한 데이터 분류에 관한 연구)

  • Son, Chang-Ho;Choi, Won-Ho;Lee, Jae-Kook
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.13 no.1
    • /
    • pp.46-51
    • /
    • 2007
  • In this paper, we propose a new data clustering method using local probability and hypothesis theory. To cluster the test data set we analyze the local area of the test data set using local probability distribution and decide the candidate class of the data set using mean standard deviation and variance etc. To decide each class of the test data, statistical hypothesis theory is applied to the decided candidate class of the test data set. For evaluating, the proposed classification method is compared to the conventional fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm. The simulation results show more accuracy than results of fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm.

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

  • Yoon, Yeo-Chan;Lee, Junwoo;Park, So-Young;Lee, Changki
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.443-454
    • /
    • 2017
  • In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

Sea Cucumber (Stichopus japonicus) Grading System Based on Morphological Features during Rehydration Process (수화 시의 형태학적 특징에 따른 건해삼의 등급 분류 시스템 개발)

  • Lee, Choong Uk;Yoon, Won Byong
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.46 no.3
    • /
    • pp.374-380
    • /
    • 2017
  • Image analysis and k-mean clustering were conducted to develop a grading system of dried sea cucumber (SC) based on rehydration rate. The SC images were obtained by taking pictures in a box under controlled light conditions. The region of interest was extracted to depict the shape of the SC in a 2D graph, and those 2D shapes were rendered to build a 3D model. The results from the image analysis provided the morphological features of the SC, including length, width, surface area, and volume, to obtain the parameters of the k-mean clustering weight. The k-mean clustering classified the SC samples into three different grades. Each SC sample was rehydrated at $30^{\circ}C$ for 40 h. During rehydration, the flux of each grade was analyzed. Our study demonstrates that the mass transfer rate of SC increased as the surface area increased, and the grade of SC was classified based on rehydration rate. This study suggests that the optimal rehydration process for SC can be achieved by applying a suitable grading system.

Adjustment of the Mean Field Rainfall Bias by Clustering Technique (레이더 자료의 군집화를 통한 Mean Field Rainfall Bias의 보정)

  • Kim, Young-Il;Kim, Tae-Soon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.8
    • /
    • pp.659-671
    • /
    • 2009
  • Fuzzy c-means clustering technique is applied to improve the accuracy of G/R ratio used for rainfall estimation by radar reflectivity. G/R ratio is computed by the ground rainfall records at AWS(Automatic Weather System) sites to the radar estimated rainfall from the reflectivity of Kwangduck Mt. radar station with 100km effective range. G/R ratio is calculated by two methods: the first one uses a single G/R ratio for the entire effective range and the other two different G/R ratio for two regions that is formed by clustering analysis, and absolute relative error and root mean squared error are employed for evaluating the accuracy of radar rainfall estimation from two G/R ratios. As a result, the radar rainfall estimated by two different G/R ratio from clustering analysis is more accurate than that by a single G/R ratio for the entire range.

Design of Meteorological Radar Pattern Classifier Using Clustering-based RBFNNs : Comparative Studies and Analysis (클러스터링 기반 RBFNNs를 이용한 기상레이더 패턴분류기 설계 : 비교 연구 및 해석)

  • Choi, Woo-Yong;Oh, Sung-Kwun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.536-541
    • /
    • 2014
  • Data through meteorological radar includes ground echo, sea-clutter echo, anomalous propagation echo, clear echo and so on. Each echo is a kind of non-precipitation echoes and the characteristic of individual echoes is analyzed in order to identify with non-precipitation. Meteorological radar data is analyzed through pre-processing procedure because the data is given as big data. In this study, echo pattern classifier is designed to distinguish non-precipitation echoes from precipitation echo in meteorological radar data using RBFNNs and echo judgement module. Output performance is compared and analyzed by using both HCM clustering-based RBFNNs and FCM clustering-based RBFNNs.

Soft Island Model based on K-means Clustering (K-Mean 군집을 기반으로 하는 소프트 아일랜드 모델)

  • Ichinkhorloo, Gotovsuren;Shin, Seong-Yoon;Lee, Hyun-Chang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.561-562
    • /
    • 2020
  • 연구에서, k-Mean 군집화에 기반 한 다중 집단이 다수의 전략의 앙상블을 실현하기 위해 제안되어, 모집단의 유사한 개체가 동일한 돌연변이 전략을 구현하는 새로운 DE 변이체, 즉 KSDE를 생성하고 유사하지 않은 하위 집단 소프트 아일랜드 모델(SIM)을 통해 정보를 마이그레이션 한다.

  • PDF

Estimation of Defect Clustering Parameter Using Markov Chain Monte Carlo (Markov Chain Monte Carlo를 이용한 반도체 결함 클러스터링 파라미터의 추정)

  • Ha, Chung-Hun;Chang, Jun-Hyun;Kim, Joon-Hyun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.32 no.3
    • /
    • pp.99-109
    • /
    • 2009
  • Negative binomial yield model for semiconductor manufacturing consists of two parameters which are the average number of defects per die and the clustering parameter. Estimating the clustering parameter is quite complex because the parameter has not clear closed form. In this paper, a Bayesian approach using Markov Chain Monte Carlo is proposed to estimate the clustering parameter. To find an appropriate estimation method for the clustering parameter, two typical estimators, the method of moments estimator and the maximum likelihood estimator, and the proposed Bayesian estimator are compared with respect to the mean absolute deviation between the real yield and the estimated yield. Experimental results show that both the proposed Bayesian estimator and the maximum likelihood estimator have excellent performance and the choice of method depends on the purpose of use.

Magnetoencephalography Interictal Spike Clustering in Relation with Surgical Outcome of Cortical Dysplasia

  • Jeong, Woorim;Chung, Chun Kee;Kim, June Sic
    • Journal of Korean Neurosurgical Society
    • /
    • v.52 no.5
    • /
    • pp.466-471
    • /
    • 2012
  • Objective : The aim of this study was to devise an objective clustering method for magnetoencephalography (MEG) interictal spike sources, and to identify the prognostic value of the new clustering method in adult epilepsy patients with cortical dysplasia (CD). Methods : We retrospectively analyzed 25 adult patients with histologically proven CD, who underwent MEG examination and surgical resection for intractable epilepsy. The mean postoperative follow-up period was 3.1 years. A hierarchical clustering method was adopted for MEG interictal spike source clustering. Clustered sources were then tested for their prognostic value toward surgical outcome. Results : Postoperative seizure outcome was Engel class I in 6 (24%), class II in 3 (12%), class III in 12 (48%), and class IV in 4 (16%) patients. With respect to MEG spike clustering, 12 of 25 (48%) patients showed 1 cluster, 2 (8%) showed 2 or more clusters within the same lobe, 10 (40%) showed 2 or more clusters in a different lobe, and 1 (4%) patient had only scattered spikes with no clustering. Patients who showed focal clustering achieved better surgical outcome than distributed cases (p=0.017). Conclusion : This is the first study that introduces an objective method to classify the distribution of MEG interictal spike sources. By using a hierarchical clustering method, we found that the presence of focal clustered spikes predicts a better postoperative outcome in epilepsy patients with CD.