Search | Korea Science

On the clustering of huge categorical data

Kim, Dae-Hak
- Journal of the Korean Data and Information Science Society
- /
- v.21 no.6
- /
- pp.1353-1359
- /
- 2010
Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.
PDF KSCI

Deduction of Acupoints Selecting Elements on Zhenjiuzishengjing using hierarchical clustering (계층적 군집분석(hierarchical clustering)을 통한 침구자생경(鍼灸資生經) 경혈 선택 요인 분석)

Oh, Junho
- Journal of Haehwa Medicine
- /
- v.23 no.1
- /
- pp.115-124
- /
- 2014
Objectives : There are plenty of medical record of acupuncture & moxibustion in Traditional East Asian medicine(TEAM). We performed this study to find out the hidden criteria lies on this record to choose proper acupoints. Methods : "Zhenjiuzishengjing", ancient TEAM book was analysed using document clustering techniques. Corpus was made from this book. It contained 196 texts driven from each symptoms. Each texts converted to vector representing frequency of 349 acupoints. Distance of vectors calculated by weighted Euclidean distance method. According to this distances, hierarchical clustering of symptoms was builded. Results : The cluster consisted of five large groups. they had high corelation with body part; head and face, chest, abdomen, upper extremity, lower extremity, back. Conclusions : It assumes that body part of symptom is the most importance criteria of acupoints selecting. some high similar symptom vectors consolidated this result. the other criteria is cause and pathway of illness. some symptoms bound together which had common cause and pathway.
PDF KSCI

Classification of network packets using hierarchical clustering (Hierarchical Clustering을 이용한 네트워크 패킷의 분류)

Yeo, Insung;Hai, Quan Tran;Hwang, Seong Oun
- Journal of Internet of Things and Convergence
- /
- v.3 no.1
- /
- pp.9-11
- /
- 2017
Recently, with the widespread use of the Internet and mobile devices, the number of attacks by hackers using the network is increasing. When connecting a network, packets are exchanged and communicated, which includes various information. We analyze the information of these packets using hierarchical clustering analysis and classify normal and abnormal packets to detect attacks. With this analysis method, it will be possible to detect attacks by analyzing new packets.
https://doi.org/10.20465/KIOTS.2017.3.1.009 인용 PDF

Results of Discriminant Analysis with Respect to Cluster Analyses Under Dimensional Reduction

Chae, Seong-San
- Communications for Statistical Applications and Methods
- /
- v.9 no.2
- /
- pp.543-553
- /
- 2002
Principal component analysis is applied to reduce p-dimensions into q-dimensions ( $q {\leq} p$). Any partition of a collection of data points with p and q variables generated by the application of six hierarchical clustering methods is re-classified by discriminant analysis. From the application of discriminant analysis through each hierarchical clustering method, correct classification ratios are obtained. The results illustrate which method is more reasonable in exploratory data analysis.
https://doi.org/10.5351/CKSS.2002.9.2.543 인용 PDF KSCI

Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering (계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류)

류중원;조성배
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2001.05a
- /
- pp.187-190
- /
- 2001
Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.
PDF

Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining (데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬)

황인수
- Korean Management Science Review
- /
- v.19 no.1
- /
- pp.189-196
- /
- 2002
Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.
PDF KSCI

An Incremental Similarity Computation Method in Agglomerative Hierarchical Clustering

Jung, Sung-young;Kim, Taek-soo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.11 no.7
- /
- pp.579-583
- /
- 2001
In the area of data clustering in high dimensional space, one of the difficulties is the time-consuming process for computing vector similarities. It becomes worse in the case of the agglomerative algorithm with the group-average link and mean centroid method, because the cluster similarity must be recomputed whenever the cluster center moves after the merging step. As a solution of this problem, we present an incremental method of similarity computation, which substitutes the scalar calculation for the time-consuming calculation of vector similarity with several measures such as the squared distance, inner product, cosine, and minimum variance. Experimental results show that it makes clustering speed significantly fast for very high dimensional data.
PDF

A Simple Tandem Method for Clustering of Multimodal Dataset

Cho C.;Lee J.W.;Lee J.W.
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 2003.05a
- /
- pp.729-733
- /
- 2003
The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.
PDF

A Study on Cluster Hierarchy Depth in Hierarchical Clustering (계층적 클러스터링에서 분류 계층 깊이에 관한 연구)

Jin, Hai-Nan;Lee, Shin-won;An, Dong-Un;Chung, Sung-Jong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.05a
- /
- pp.673-676
- /
- 2004
Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering provide a view of the data at different levels, making the large document collections are adapted to people's instinctive and interested requires. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simpleness, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system [10] with hierarchical structure based on document clustering using K-means algorithm to "get the best of both worlds". The performance of CONDOR system is compared with the VIVISIMO hierarchical clustering system [9], and performance is analyzed on feature words selection of specific topics and the optimum hierarchy depth.
PDF

An Energy Consumption Model using Hierarchical Unequal Clustering Method (계층적 불균형 클러스터링 기법을 이용한 에너지 소비 모델)

Kim, Jin-Su;Shin, Seung-Soo
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.12 no.6
- /
- pp.2815-2822
- /
- 2011
Clustering method in wireless sensor networks is the technique that forms the cluster to aggregate the data and transmit them at the same time that they can use the energy efficiently. In this paper, I propose the hierarchical unequal clustering method using cluster group model. This divides the entire network into two layers. The data aggregated from layer 2 consisted of cluster group is sent to layer 1, after re-aggregation the total data is sent to base station. This method decreases whole energy consumption by using cluster group model with multi-hop communication architecture. Hot spot problem can be solved by establishing unequal cluster. I also show that proposed hierarchical unequal clustering method is better than previous clustering method at the point of network energy efficiency.
https://doi.org/10.5762/KAIS.2011.12.6.2815 인용 PDF KSCI

Search Result 269, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)