Search | Korea Science

Finding the Number of Clusters and Various Experiments Based on ASA Clustering Method (ASA 군집화를 이용한 군집수 결정 및 다양한 실험)

Yoon Bok-Sik
- Journal of the Korean Operations Research and Management Science Society
- /
- v.31 no.2
- /
- pp.87-98
- /
- 2006
In many cases of cluster analysis we are forced to perform clustering without any prior knowledge on the number of clusters. But in some clustering methods such as k-means algorithm it is required to provide the number of clusters beforehand. In this study, we focus on the problem to determine the number of clusters in the given data. We follow the 2 stage approach of ASA clustering algorithm and mainly try to improve the performance of the first stage of the algorithm. We verify the usefulness of the method by applying it for various kinds of simulated data. Also, we apply the method for clustering two kinds of real life qualitative data.
PDF KSCI

Hybrid Simulated Annealing for Data Clustering (데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링)

Kim, Sung-Soo;Baek, Jun-Young;Kang, Beom-Soo
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.40 no.2
- /
- pp.92-98
- /
- 2017
Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.
https://doi.org/10.11627/jkise.2017.40.2.092 인용 PDF KSCI

Medoid Determination in Deterministic Annealing-based Pairwise Clustering

Lee, Kyung-Mi;Lee, Keon-Myung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.3
- /
- pp.178-183
- /
- 2011
The deterministic annealing-based clustering algorithm is an EM-based algorithm which behaves like simulated annealing method, yet less sensitive to the initialization of parameters. Pairwise clustering is a kind of clustering technique to perform clustering with inter-entity distance information but not enforcing to have detailed attribute information. The pairwise deterministic annealing-based clustering algorithm repeatedly alternates the steps of estimation of mean-fields and the update of membership degrees of data objects to clusters until termination condition holds. Lacking of attribute value information, pairwise clustering algorithms do not explicitly determine the centroids or medoids of clusters in the course of clustering process or at the end of the process. This paper proposes a method to identify the medoids as the centers of formed clusters for the pairwise deterministic annealing-based clustering algorithm. Experimental results show that the proposed method locate meaningful medoids.
https://doi.org/10.5391/IJFIS.2011.11.3.178 인용 PDF KSCI

Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders (합성곱 오토인코더 기반의 응집형 계층적 군집 분석)

Park, Nojin;Ko, Hanseok
- Journal of Korea Multimedia Society
- /
- v.23 no.1
- /
- pp.1-7
- /
- 2020
Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.
https://doi.org/10.9717/kmms.2020.23.1.001 인용 PDF KSCI HTML

A Study of optimized clustering method based on SOM for CRM

Jong T. Rhee;Lee, Joon.
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2001.01a
- /
- pp.464-469
- /
- 2001
CRM(Customer Relationship Management : CRM) is an advanced marketing supporting system which analyze customers\` transaction data and classify or target customer groups to effectively increase market share and profit. Many engines were developed to implements the function and those for classification and clustering are considered core ones. In this study, an improved clustering method based on SOM(Self-Organizing Maps : SOM) is proposed. The proposed clustering method finds the optimal number of clusters so that the effectiveness of clustering is increased. It considers all the data types existing in CRM data warehouses. In particular, and adaptive algorithm where the concepts of degeneration and fusion are applied to find optimal number of clusters. The feasibility and efficiency of the proposed method are demonstrated through simulation with simplified data of customers.
PDF

Spectral clustering: summary and recent research issues (스펙트럴 클러스터링 - 요약 및 최근 연구동향)

Jeong, Sanghun;Bae, Suhyeon;Kim, Choongrak
- The Korean Journal of Applied Statistics
- /
- v.33 no.2
- /
- pp.115-122
- /
- 2020
K-means clustering uses a spherical or elliptical metric to group data points; however, it does not work well for non-convex data such as the concentric circles. Spectral clustering, based on graph theory, is a generalized and robust technique to deal with non-standard type of data such as non-convex data. Results obtained by spectral clustering often outperform traditional clustering such as K-means. In this paper, we review spectral clustering and show important issues in spectral clustering such as determining the number of clusters K, estimation of scale parameter in the adjacency of two points, and the dimension reduction technique in clustering high-dimensional data.
https://doi.org/10.5351/KJAS.2020.33.2.115 인용 PDF KSCI

Hierarchical Clustering of Gene Expression Data Based on Self Organizing Map (자기 조직화 지도에 기반한 유전자 발현 데이터의 계층적 군집화)

Park, Chang-Beom;Lee, Dong-Hwan;Lee, Seong-Whan
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2003.10a
- /
- pp.170-177
- /
- 2003
Gene expression data are the quantitative measurements of expression levels and ratios of numberous genes in different situations based on microarray image analysis results. The process to draw meaningful information related to genomic diseases and various biological activities from gene expression data is known as gene expression data analysis. In this paper, we present a hierarchical clustering method of gene expression data based on self organizing map which can analyze the clustering result of gene expression data more efficiently. Using our proposed method, we could eliminate the uncertainty of cluster boundary which is the inherited disadvantage of self organizing map and use the visualization function of hierarchical clustering. And, we could process massive data using fast processing speed of self organizing map and interpret the clustering result of self organizing map more efficiently and user-friendly. To verify the efficiency of our proposed algorithm, we performed tests with following 3 data sets, animal feature data set, yeast gene expression data and leukemia gene expression data set. The result demonstrated the feasibility and utility of the proposed clustering algorithm.
PDF

An Improved K-means Document Clustering using Concept Vectors

Shin, Yang-Kyu
- Journal of the Korean Data and Information Science Society
- /
- v.14 no.4
- /
- pp.853-861
- /
- 2003
An improved K-means document clustering method has been presented, where a concept vector is manipulated for each cluster on the basis of cosine similarity of text documents. The concept vectors are unit vectors that have been normalized on the n-dimensional sphere. Because the standard K-means method is sensitive to initial starting condition, our improvement focused on starting condition for estimating the modes of a distribution. The improved K-means clustering algorithm has been applied to a set of text documents, called Classic3, to test and prove efficiency and correctness of clustering result, and showed 7% improvements in its worst case.
PDF

A Study on Process Data Compression Method by Clustering Method (클러스터링 기법을 이용한 공정 데이터의 압축 저장 기법에 관한 연구)

Kim Yoonsik;Mo Kyung Joo;Yoon En Sup
- Journal of the Korean Institute of Gas
- /
- v.4 no.4 s.12
- /
- pp.58-64
- /
- 2000
Data compression and retrieval method are investigated for the effective utilization of measured process data. In this paper, a new data compression method, Clustering Compression(CC), which is based on the k-means clustering algorithm and piecewise linear approximation method is suggested. Case studies on industrial data set showed the superior performance of clustering based techniques compared to other conventional methods and showed that CC could handle the compression of multi-dimensional data.
PDF

A Study on Density-Based Clustering Method Considering Directionality (방향성을 고려한 밀도 기반 클러스터링 기법에 관한 연구)

Jinman Kim;Joongjin Kook
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.2
- /
- pp.38-44
- /
- 2024
This research proposed DBSCAN-D, which is a clustering technique for locating POI based on existing density-based clustering research, such as GPS data, generated by moving objects. This method is designed based on 'staying time' and 'directionality' extracted from the relationship between GPS data. The staying time can be extracted through the difference in the reception time between data using the time at which the GPS data is received. Directionality can be expressed by moving the area of data generated later in the direction of the position of the previously generated data by concentrating on the point where the GPS data is sequentially generated. Through these two properties, it is possible to perform clustering suitable for the data set generated by the moving object.
PDF

Search Result 2,747, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)